This is a fully remote position, open to applicants in United States.

📋 Description

• Establish and initially operate the SRE practice, including the SLO framework, on-call rotation, and incident command process.

• Develop SLOs, manage the rotation, lead incident responses for live banking customers, and create postmortem reports.

• Determine severity tiers, SLA commitments for each customer tier, and escalation procedures for production support.

• Establish operational standards across all four engineering lanes: sprint discipline, release rituals, code review standards, and change management documentation.

⛳️ Requirements

• A minimum of ten years of experience in engineering.

• At least five years of hands-on experience building SRE or platform operations functions within a software company catering to enterprise or regulated markets.

• Familiarity with organizations that deliver software to customers and manage it at scale, such as ServiceNow, MongoDB, AWS, GCP, or similar.

• Experience managing multi-tenant and multi-deployment-model infrastructure, understanding the complexities involved in the final stages.

• Ability to create SLOs that are effectively utilized.

• Experience establishing an on-call rotation from the ground up.

• Acted as the technical lead during production incidents and understand the implications of lacking a defined process.

• Build trust with senior engineers based on merit rather than title.

• Self-motivated individual dedicated to developing processes and infrastructure.

🏝️ Benefits

• Competitive base salary and significant equity opportunities.

• Strong preference for candidates in Atlanta, GA. Consideration for West Coast applicants on an individual basis. Remote work is an option for the right candidate.

VP of Site Reliability

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Cloud Engineer – DevOps

DevSecOps/DevOps Engineer

Deployment Engineer

Senior Cloud - Kubernetes SRE

DevOps Engineer

DevSecOps Engineer

Never miss a great job!