
Senior Engineering Manager, Cloud Platform
Posted Jun 21

Posted Jun 21
This is a fully remote position, open to applicants in United States.
• Oversee software engineering teams that deliver infrastructure-as-code for managing cloud infrastructure.
• Create governance structures and systems that empower application development teams to provision infrastructure independently, ensuring adherence to best practices and controls.
• Offer documentation, training, and assistance to guarantee that feature development teams utilize self-service capabilities effectively.
• Recruit skilled site reliability personnel and a line manager to enhance and supervise the SRE team.
• Professionalize incident management by defining and documenting incident processes and practices for both your SRE team and the application feature teams.
• Promote a culture of incident professionalism throughout the engineering organization by fostering training and the adoption of processes.
• Implement a design-before-build approach. Facilitate the creation of lightweight design documents, architectural decision records, and working group reviews.
• Utilize design reviews, code reviews, and blameless retrospectives to cultivate a culture of quality and excellence in engineering.
• Proven experience leading teams that operate SaaS service infrastructure.
• Extensive hands-on experience with deploying and managing production infrastructure on public cloud platforms (AWS is strongly preferred; familiarity with Azure and GCP is a plus).
• Strong expertise in Infrastructure as Code, including Terraform; experience with Crossplane and GitOps patterns is highly preferred.
• Experience in managing production Kubernetes environments at scale.
• Comprehensive understanding of security best practices, including zero trust architecture, secrets management, identity and access management, and software supply chain security.
• Experience in building and maintaining self-service infrastructure platforms that empower application development teams, while balancing self-service and developer productivity with maintainability and security.
• Experience in leading or establishing SRE functions, encompassing incident management processes, on-call programs, SLO/SLA definitions, and operational runbooks.
• Extensive hands-on experience with observability, including application performance management, logs and traces, and key signals and service-specific metrics.
• Expert in guiding infrastructure teams to convert business and product needs into technical requirements and engineering outputs.
• Demonstrated ability to recruit, develop, and retain high-performing engineers and engineering managers in remote or distributed settings.
• Health insurance
• Vision insurance
• Dental insurance
• Flexible vacation policy
• Generous parental leave
Zero Hash
Anthology Careers
Flosum
Mozilla
Get handpicked remote jobs straight to your inbox weekly.