
Engineering Manager, Infrastructure Platforms
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in India.
• Recruit and lead a high-achieving team of Site Reliability Engineers in India who embody our values.
• Conduct regular one-on-one meetings with each team member, offering coaching and consistent feedback regarding their performance.
• Organize and continually enhance the team's shift and weekend coverage model for Dedicated migrations.
• Take ownership of the operational execution of Dedicated Geo migrations and cutovers, encompassing planning, pre-cutover preparation, live execution, and post-cutover validation and cleanup.
• Ensure that the team delivers prompt, high-quality responses to Geo-related escalations from Support and internal partners.
• Encourage technical decision-making within the team, intervening to make final decisions when required—particularly during critical migrations or incidents.
• Develop and maintain runbooks, guardrails, and post-cutover reviews to ensure the team operates systematically rather than through improvisation, especially during ramp-up periods.
• Work together with core Geo, Dedicated migrations, and other Infrastructure teams to identify and prioritize engineering investments that enhance migration tooling and processes.
• Define, monitor, and report on essential operational metrics such as escalation volume, internal escalation rates, cutover coverage, response times, and team health indicators, using this data to drive continuous improvement.
• Participate in the Incident Management on-call rotation to help meet availability goals for GitLab.com, collaborating with reliability engineers and development team members.
• Over 3 years of experience managing SRE, infrastructure, or platform engineering teams operating highly-available distributed systems at scale, preferably in a SaaS environment with customer-facing SLAs.
• Proven ability to lead in a remote, high-performance setting, collaborating across various time zones and cultures.
• Experience overseeing or significantly contributing to large-scale data migrations where the integrity of customer data and risks of downtime must be meticulously managed.
• Strong background in infrastructure, including cloud platforms, observability, incident response, and distributed multi-tenant architectures.
• Exceptional communication and interpersonal skills, with the capacity to convey complex technical concepts and risk trade-offs into clear, actionable insights for both technical and non-technical stakeholders, including customers.
• Strong problem-solving skills and keen attention to detail, with a focus on delivering high-quality, low-risk operational outcomes in a fast-paced, dynamic environment.
• Alignment with our company values and a dedication to working in accordance with those values.
• Benefits to support your health, finances, and well-being
• Flexible Paid Time Off
• Team Member Resource Groups
• Equity Compensation & Employee Stock Purchase Plan
• Growth and Development Fund
• Parental leave
• Home office support
refurbed
Atlan Stormwater
Hint Health
Trust Wallet
Get handpicked remote jobs straight to your inbox weekly.