Remotery

Lead Site Reliability Engineer

Posted Jun 20

This is a fully remote position, open to applicants in Canada.

📋 Description

• Identify failure patterns throughout the organization.

• Analyze incidents and conduct post-incident reviews to uncover recurring technical root causes affecting customers, rather than addressing each incident individually.

• Prioritize the most significant improvement opportunities.

• Direct reliability efforts towards areas that significantly reduce Mean Time to Detect and Mean Time to Resolve, while also proactively preventing future incidents.

• Transform identified patterns into appropriate engineering interventions and influence the teams responsible for implementing them.

• Provide hands-on assistance to teams in their code through Merge Requests, pairing, code reviews, and active technical support, preferring the simplest interventions that prevent recurrence over more complex solutions.

• Share and promote improvements throughout the organization.

• Lead technical discussions during post-incident reviews and operational forums.

• Assist the Incident Response Team in enhancing its engineering practices by collaborating on real tasks, demonstrating effective engineering in context, and conducting internal learning sessions that transition the team from incident-response specialists to incident-response engineers.

• Collaborate across teams without direct authority.


⛳️ Requirements

• A minimum of 5 years of hands-on experience in Site Reliability, Platform, or Infrastructure Engineering within a large-scale, distributed production environment, with proficiency in at least one programming language (e.g., Python, Go, TypeScript, Java).

• Proven experience in promoting the adoption of reliability or platform patterns (e.g., progressive delivery, observability standards, resilience libraries, secret rotation) across teams not under your direct supervision, yielding measurable outcomes.

• Strong systems thinking with a clear preference for simple solutions—capable of analyzing an incident or design to identify the underlying problem class (e.g., retries, cascading failures, queuing behavior, partial failures, head-of-line blocking) and the simplest, most cost-effective intervention to address it.

• Comfortable selecting a post-deploy curl check over a full sandbox environment when a simpler intervention can prevent the same incident.

• Practical experience with the modern reliability stack: at least one major cloud platform (AWS, Google Cloud, or Azure), an observability platform (such as New Relic, Datadog, or Grafana), defining and operating against Service Level Objectives, continuous integration and deployment pipelines, and infrastructure-as-code (e.g., AWS CDK, Pulumi).

• Hands-on experience with Artificial Intelligence and Large Language Model tooling in an engineering context, such as integrating Large Language Models into workflows or operational tools, or utilizing Artificial Intelligence effectively in your engineering work.


🏝️ Benefits

• Health, wealth, and wellness programs.

• Long-term equity incentives.

People also viewed

Investigo8 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind8 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal8 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job
Avaya8 hours ago

Site Reliability Engineer – Azure, DevSecOps, IaC, Governance, Observability

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$129k – $143k/year
ApplyView job
Agilent Technologies8 hours ago

DevOps Engineer – Platform, AWS, CI/CD

US flagColorado OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$143.8k – $224.6k/year
ApplyView job
Dropbox8 hours ago

Site Reliability Engineer

PL flagPoland OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers