This is a fully remote position, open to applicants in United States.

📋 Description

• Take charge of the migration from Heroku to Google Cloud Platform, ensuring architecture, execution, and a seamless cutover that meets expectations.

• Develop and maintain the Postgres core, Fivetran pipeline, BigQuery data layer, and Hex reporting infrastructure.

• Enhance the critical paths that are most impactful: optimize key backend code and the most demanding third-party syncs to maintain performance as volume increases.

• Manage monitoring, alerting, cost reduction, and proactive scaling: identify issues early, control expenditures, and anticipate growth instead of merely responding to it.

• Lead incident responses and create post-mortems that transform outages into lasting solutions and a more knowledgeable team.

• Establish high operational standards across engineering and elevate others to meet them.

⛳️ Requirements

• Ownership of production reliability: Proven history of personally ensuring production reliability at a significant scale with concrete examples of incidents you managed, resolved, and prevented from reoccurring, rather than just participating. This is a core responsibility, not an ancillary one.

• Experience with infrastructure migrations: Genuine experience managing a cloud migration from start to finish, not just contributing to one. Proficient in GCP (or a similar cloud platform), infrastructure-as-code, and the failure modes associated with distributed systems.

• Expertise in observability and proactive operations: You create monitoring and alerting systems that detect issues before users are affected. You know what to measure, what to alert on, and what is just background noise.

• High agency: You identify the most impactful reliability challenges and take the initiative to address them without waiting for assignment. You don’t wait for outages to validate your efforts.

• Integration of AI in your workflows: Specific instances demonstrating how AI has enhanced your debugging, automation, or operational processes for increased speed and reliability.

🏝️ Benefits

• Lead the migration from Heroku to Google Cloud Platform.

• Create and sustain the Postgres core, Fivetran pipeline, BigQuery data layer, and Hex reporting infrastructure.

• Optimize the most critical paths: enhance key backend code and our most substantial third-party syncs to ensure performance as volume increases.

• Manage monitoring, alerting, cost reduction, and proactive scaling: identify issues early, manage costs effectively, and stay ahead of growth rather than simply reacting.

• Lead incident response and develop post-mortems that convert outages into lasting solutions and a more adept team.

• Set high operational standards across the engineering team and assist others in achieving them.

Site Reliability Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Cloud Engineer – DevOps

DevSecOps/DevOps Engineer

Deployment Engineer

Senior Cloud - Kubernetes SRE

DevOps Engineer

DevSecOps Engineer

Never miss a great job!