
Senior Engineering Manager, Cloud Platform
Posted May 11

Posted May 11
This is a fully remote position, open to applicants in United States.
• Lead software engineering teams in delivering infrastructure-as-code solutions for managing cloud infrastructure.
• Create governance frameworks and mechanisms that empower application development teams to provision infrastructure autonomously while ensuring adherence to best practices and controls.
• Offer documentation, training, and support to help feature development teams fully utilize self-service capabilities.
• Recruit skilled site reliability engineers and a line manager to develop and oversee the SRE team.
• Enhance incident management by defining and documenting incident processes and practices for your SRE team and the application feature teams.
• Promote professionalism in incident management across the engineering organization through training and process implementation.
• Instill a design-before-build discipline by facilitating lightweight design documents, architectural decision records, and working group reviews.
• Utilize design reviews, code reviews, and blameless retrospectives to foster a culture of quality and excellence in engineering.
• Proven experience in leading teams that manage SaaS service infrastructure.
• Extensive hands-on experience in deploying and operating production infrastructure on public cloud platforms, with a strong preference for AWS; familiarity with Azure and GCP is a plus.
• Strong expertise in Infrastructure as Code, including Terraform; prior experience with Crossplane and GitOps patterns is highly preferred.
• Experience in managing large-scale production Kubernetes environments.
• Comprehensive understanding of security best practices, including zero trust architecture, secrets management, identity and access management, and software supply chain security.
• Experience in building and maintaining self-service infrastructure platforms that empower application development teams, while balancing self-service capabilities with maintainability and security.
• Background in leading or establishing SRE functions, including incident management processes, on-call programs, SLO/SLA definitions, and operational runbooks.
• In-depth hands-on experience with observability, including application performance management, logs and traces, golden signals, and service-specific metrics.
• Expertise in guiding infrastructure teams to convert business and product needs into technical requirements and engineering outputs.
• Demonstrated ability to hire, develop, and retain high-performing engineers and engineering managers in remote or distributed settings.
• Health insurance
• Vision insurance
• Dental insurance
• Flexible vacation policy
• Generous parental leave
Platphorm, LLC
Instacart
Zwift
Zero Hash
Get handpicked remote jobs straight to your inbox weekly.