Remotery

Site Reliability Engineer

Posted 6 days ago

This is a fully remote position, open to applicants anywhere in the world.

đź“‹ Description

• Collaborate with service teams to establish significant SLIs and SLOs that are rooted in customer experience, and develop error budget policies that translate them into engineering decisions.

• Take ownership of and enhance the Operational Readiness Review (ORR) process, conducting evaluations for new services and major changes related to observability, alerting, runbooks, capacity, and graceful degradation.

• Enhance the incident-to-improvement pipeline by linking postmortem insights to operational readiness shortcomings, pinpointing recurring failure patterns, and driving systematic resolutions.

• Serve as the reliability authority that teams consult for architecture reviews, failure mode analyses, dependency mapping, and resilience design.

• Identify and assess operational toil across the organization, advocating for or creating automation solutions that eliminate it.

• Assist teams in developing sustainable on-call practices, focusing on alert quality, escalation procedures, runbook coverage, and noise reduction.

• Monitor and report on the overall operational maturity of the organization, highlighting systemic deficiencies and promoting remediation efforts.


⛳️ Requirements

• Possess over 7 years of experience in SRE, production engineering, or roles focused on reliability, including shaping SRE practices and fostering their adoption within engineering teams.

• Have a software engineering mindset—capable of writing code and building tools, not just configuring them.

• Demonstrate hands-on experience in defining and operationalizing SLOs/SLIs at scale, including error budget policies that have genuinely influenced engineering decisions.

• Hold extensive experience in incident response, facilitating postmortems, and transforming incident learnings into systemic enhancements.

• Have experience with large-scale multi-tenant systems (bonus: managed database platforms or Postgres).

• Be proficient with cloud infrastructure (AWS preferred) and infrastructure-as-code (Pulumi preferred, Terraform/CDK also acceptable).

• Communicate effectively and persuasively—this position necessitates the ability to influence without authority in a distributed organization.

• Have experience working in asynchronous or globally distributed teams.

• Be motivated by empowering other teams to be more effective rather than being the sole problem-solver.


🏝️ Benefits

• Fully Remote

• ESOP

• Tech Allowance

• Health Benefits

• Annual Off-Sites

• Flexible Work

• Professional Development

People also viewed

Innovative Solutions1 hour ago

Cloud Engineer – DevOps

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$100k – $160k/year
ApplyView job
Caspar Health1 hour ago

DevSecOps/DevOps Engineer

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
IVIX1 hour ago

Deployment Engineer

US flagNew York OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Investigo12 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind12 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal12 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers