Remotery

Staff Site Reliability Engineer

Posted 1 hour ago

This is a fully remote position, open to applicants in California.

📋 Description

• Spearhead the creation of Domino's internal AI-driven reliability tools, encompassing systems that evaluate tickets, logs, traces, and documentation to assist teams in resolving outages more swiftly and with reduced recurring effort.

• Enhance the observability coverage and signal quality for our most vital customer-facing systems, providing engineers with more resources throughout the development and support lifecycle.

• Take ownership of incident response from start to finish, from detection through remediation, ensuring that each problem area is better documented, understood, and less prone to recurrence.

• Direct the development of customer and user-facing observability tools integrated within our products.

• Establish and refine SLO/SLI frameworks for priority services, transforming abstract reliability objectives into quantifiable, actionable standards.

• Scale cloud operations practices for Domino’s single-tenant SaaS solution and collaborate with engineering teams to enhance the reliability and consistency of customer deployments and upgrades.

• Mentor fellow engineers and influence the practice of SRE at Domino, including incident response procedures, operational readiness standards, and a culture of post-incident learning.


⛳️ Requirements

• Extensive experience in Site Reliability Engineering, platform engineering, or a software engineering position with authentic, hands-on operational responsibility.

• Proficiency with Kubernetes, Linux, cloud platforms, and observability tools, along with the capability to utilize them for investigating complex, real-world production issues.

• A strong aptitude for identifying and addressing reliability gaps in technical products, tools, and processes.

• Solid software development skills in Python or Go, with a proven history of creating internal tools or services that are genuinely relied upon.

• Comfort in leading technically ambiguous projects and influencing direction across teams without requiring direct authority to accomplish tasks.

• A background of enhancing reliability through engineering and automation, rather than solely relying on manual fire-fighting.

• Excellent communication skills and substantial experience mentoring engineers or influencing technical decision-making within your team.

• Sound judgment regarding AI/LLM tools: understanding where they truly assist in operational workflows and where they introduce more noise than clarity.

• Bonus: Familiarity with LLM-based systems, retrieval workflows, SaaS platform operations, or developing tools for support or developer teams.


🏝️ Benefits

• Equity

• Company bonus or sales commissions/bonuses

• 401(k) plan

• Medical, dental, and vision benefits

• Wellness stipends

People also viewed

Instacart1 hour ago

Program Manager II

US flagCalifornia, +18 more statesFull-timeUncategorized$122k – $155k/year
ApplyView job
CLASP1 hour ago

Senior Product Manager – Candidate & Recruiter Platform

US flagMassachusetts OnlyFull-timeUncategorized$140k – $170k/year
ApplyView job
Tevora1 hour ago

Account Director

US flagOregon OnlyFull-timeUncategorized$110k – $130k/year
ApplyView job
Tailor1 hour ago

Forward-Deployed Product Manager – FDPM

US flagCalifornia OnlyFull-timeUncategorized$130k – $170k/year
ApplyView job
Cube Care Company1 hour ago

Human Resource Generalist

US flagUnited States OnlyFull-timeUncategorized
ApplyView job
Juniper Square1 hour ago

Product Marketing Engineer

US flagUnited States OnlyFull-timeUncategorized$160k – $215k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers