
Lead Site Reliability Developer
Posted 10 hours ago

Posted 10 hours ago
This is a fully remote position, open to applicants in Arizona, +3 more states.
• Lead consulting initiatives from the discovery phase through to delivery by ensuring stakeholder alignment on priorities, sequencing tasks, and communicating measurable outcomes.
• Establish a regular working rhythm and facilitate decision-making forums to highlight risks, map dependencies, and drive accountability along with timelines.
• Align stakeholders from product, platform, and engineering on reliability objectives and trade-offs using Service Level Objectives (SLOs) and error budgets.
• Collaborate frequently with Engineering Managers, product managers, Staff and Principal engineers, and platform leads to ensure alignment on dependencies, decisions, and delivery timelines.
• Identify systemic risks across shared dependencies and coordinate remediation efforts across various teams to minimize recurring incidents.
• Foster change adoption by integrating reliability mechanisms into partner team routines, including planning, Post-Release Reviews (PRRs), and on-call practices.
• Design and implement reusable reliability mechanisms, templates, and tooling that can be utilized across multiple teams.
• Establish and refine production readiness review practices with partner teams to enhance launch quality and change safety.
• Drive the observability strategy for partner domains by enhancing signal quality, alerting philosophy, and operational dashboards.
• Lead complex incident investigations, ensuring that learnings result in sustainable fixes with assigned owners and verification processes.
• Conduct reliability-focused design and code reviews, guiding teams towards simpler and safer architectural solutions.
• Mentor Senior engineers and other consultants through pairing, reviews, and structured coaching to amplify impact.
• Collaborate with internal platform engineering to influence roadmaps and deliver shared capabilities that accelerate Site Reliability Engineering (SRE) adoption.
• Enhance CSRE Consulting playbooks and operational practices based on recurring patterns identified across teams.
• In-depth practical understanding of SRE principles, including SLO governance and error budget policies in practice.
• Demonstrated ability to lead technical efforts across teams and influence without direct authority.
• Extensive experience in designing and troubleshooting distributed systems with cross-service failure modes.
• Proven experience in shaping observability and alerting strategies, along with improving operational signal quality.
• Strong expertise in Kubernetes and AWS, including governance and cost-related trade-offs.
• Ability to design reliability automation and tooling that is reusable and can be adopted by various teams.
• Experience in leading production readiness and resilience practices, including Disaster Recovery (DR) validation and controlled testing.
• Strong software engineering fundamentals, with the capability to deliver and review high-quality changes in enterprise-level codebases.
• Advanced incident analysis skills focused on reducing systemic risk and promoting organizational learning.
• Excellent communication skills, including the ability to create executive-ready summaries and clear technical diagrams.
• Health: Medical, vision, dental, and mental health benefits for you and your family, along with access to a health care concierge, and Flexible or Health Savings Accounts (FSA or HSA).
• Yourself: Complimentary concert tickets, generous paid time off including holidays, sick leave, and personal days.
• Wealth: 401(k) program with company matching, stock reimbursement program.
• Family: New parent programs including caregiver leave, plus support for fertility, adoption, foster care, or surrogacy.
• Career: Career and skill development initiatives with School of Live, tuition reimbursement, and student loan repayment options.
• Others: Volunteer time off and crowdfunding match.
Investigo
Software Mind
Cherokee Federal
Avaya
Get handpicked remote jobs straight to your inbox weekly.