Remotery

Site Reliability Engineer

Posted Jun 21

This is a fully remote position, open to applicants in United States.

📋 Description

• Take charge of the migration from Heroku to Google Cloud Platform, ensuring architecture, execution, and a seamless cutover that meets expectations.

• Develop and maintain the Postgres core, Fivetran pipeline, BigQuery data layer, and Hex reporting infrastructure.

• Enhance the critical paths that are most impactful: optimize key backend code and the most demanding third-party syncs to maintain performance as volume increases.

• Manage monitoring, alerting, cost reduction, and proactive scaling: identify issues early, control expenditures, and anticipate growth instead of merely responding to it.

• Lead incident responses and create post-mortems that transform outages into lasting solutions and a more knowledgeable team.

• Establish high operational standards across engineering and elevate others to meet them.


⛳️ Requirements

• Ownership of production reliability: Proven history of personally ensuring production reliability at a significant scale with concrete examples of incidents you managed, resolved, and prevented from reoccurring, rather than just participating. This is a core responsibility, not an ancillary one.

• Experience with infrastructure migrations: Genuine experience managing a cloud migration from start to finish, not just contributing to one. Proficient in GCP (or a similar cloud platform), infrastructure-as-code, and the failure modes associated with distributed systems.

• Expertise in observability and proactive operations: You create monitoring and alerting systems that detect issues before users are affected. You know what to measure, what to alert on, and what is just background noise.

• High agency: You identify the most impactful reliability challenges and take the initiative to address them without waiting for assignment. You don’t wait for outages to validate your efforts.

• Integration of AI in your workflows: Specific instances demonstrating how AI has enhanced your debugging, automation, or operational processes for increased speed and reliability.


🏝️ Benefits

• Lead the migration from Heroku to Google Cloud Platform.

• Create and sustain the Postgres core, Fivetran pipeline, BigQuery data layer, and Hex reporting infrastructure.

• Optimize the most critical paths: enhance key backend code and our most substantial third-party syncs to ensure performance as volume increases.

• Manage monitoring, alerting, cost reduction, and proactive scaling: identify issues early, manage costs effectively, and stay ahead of growth rather than simply reacting.

• Lead incident response and develop post-mortems that convert outages into lasting solutions and a more adept team.

• Set high operational standards across the engineering team and assist others in achieving them.

People also viewed

Innovative Solutions2 hours ago

Cloud Engineer – DevOps

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$100k – $160k/year
ApplyView job
Caspar Health2 hours ago

DevSecOps/DevOps Engineer

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
IVIX2 hours ago

Deployment Engineer

US flagNew York OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Investigo12 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind12 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal12 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers