
Senior Manager, Site Reliability Engineering
Posted 5 days ago

Posted 5 days ago
This is a fully remote position, open to applicants in United States.
• Lead and develop our SRE team of approximately 10 engineers, focusing on hiring, retention, career growth, and performance management across various time zones (US, HK, NZ).
• Establish strategic partnerships with product engineering teams — transitioning SRE from a reactive, ticket-based approach to a proactive co-ownership of reliability outcomes.
• Scale our multi-tenant infrastructure to facilitate new customer onboarding and accommodate expanding patient populations.
• Take ownership of cloud cost management and FinOps practices, creating frameworks that balance cost efficiency with reliability and performance.
• Promote developer self-service and platform engineering by creating self-service capabilities for product teams to manage routine operations independently, without the need for SRE tickets. Set SLOs/SLIs for essential services and enhance alert quality to ensure every notification is significant.
• Ensure the SRE team is effectively utilizing AI tools in their workflows — employing tools like Claude Code for infrastructure as code generation, log analysis, root cause investigations, and automating repetitive tasks — at the same proficiency as the rest of the engineering team.
• You possess over 6 years of experience managing an SRE team and more than 10 years of practical SRE or infrastructure engineering experience.
• You are highly familiar with our core technology stack: Kubernetes, GCP (GKE, Cloud SQL, Pub/Sub, GCS), Terraform, Helm, ArgoCD, PostgreSQL, and Prometheus/Grafana.
• You have strong programming abilities in Python and/or Go and are adept at writing and reviewing infrastructure tooling code — including the use of AI coding tools.
• You have experience with CI/CD pipelines (GitHub Actions) and a proven history of enhancing developer tooling and automation.
• You possess sound judgment regarding build versus buy decisions — you consistently choose the right solution over the easiest one, and you are comfortable developing internal tools when existing solutions are inadequate.
• You have experience leading teams across multiple time zones and a proven track record of nurturing engineers into proficient technical contributors.
• Financial Well-Being: Our dedication to attracting and retaining top talent starts with a competitive base salary and equity opportunities. We also provide a performance-based bonus program, 401k matching, and regular compensation reviews to acknowledge and reward outstanding contributions.
• Physical Well-Being: We emphasize the health and well-being of our employees and their families by offering comprehensive medical, dental, and vision coverage. Your health is important to us, and we invest in ensuring you have access to quality healthcare.
• Mental Well-Being: We recognize the significance of mental health in enhancing productivity and maintaining work-life balance. To support this, we provide initiatives such as No-Meeting Fridays, monthly company holidays, access to mental health resources, and a generous flexible time-off policy. Furthermore, we embrace a remote-first culture that fosters collaboration and flexibility, allowing our team members to excel from any location.
• Professional Development: Cultivating internal talent is a key priority for Clover. We provide learning programs, mentorship, professional development funding, and regular performance feedback and reviews.
• Additional Perks: Employee Stock Purchase Plan (ESPP) offering discounted equity opportunities.
• Reimbursement for office setup costs.
• Monthly cell phone and internet stipend.
• Remote-first culture, promoting collaboration with global teams.
• Paid parental leave for all new parents.
• And much more!
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.