Remotery

Staff Site Reliability Engineer

Posted May 11

This is a fully remote position, open to applicants in Argentina.

📋 Description

• Spearhead the creation of Domino's internal AI-driven reliability tools, which include systems that evaluate tickets, logs, traces, and documentation to assist teams in swiftly resolving outages with reduced recurring effort.

• Enhance the observability scope and signal integrity for our most essential customer-facing systems, providing engineers with better resources throughout the development and support lifecycle.

• Manage incident response comprehensively, from detection to resolution, ensuring each problem area is better documented, understood, and less prone to recurrence.

• Lead the creation of customer and user-oriented observability tools integrated within our products.

• Establish and refine SLO/SLI frameworks for priority services, transforming abstract reliability objectives into measurable, actionable standards.

• Scale cloud operations practices for Domino’s single-tenant SaaS solution, collaborating with engineering teams to enhance the reliability and consistency of customer deployments and upgrades.

• Mentor fellow engineers and influence the practice of SRE at Domino, including incident response processes, operational readiness expectations, and a culture of learning from incidents.


⛳️ Requirements

• Extensive experience in Site Reliability Engineering, platform engineering, or a software engineering role with significant, hands-on operational responsibility.

• Proficiency in Kubernetes, Linux, cloud platforms, and observability tools, with the capability to utilize them to diagnose intricate, real-world production issues.

• A strong aptitude for identifying and bridging reliability gaps in technical products, tools, and processes.

• Solid software engineering capabilities in Python or Go, with a proven history of developing internal tools or services that are genuinely relied upon.

• Comfort in leading technically ambiguous projects and influencing direction across teams without requiring direct authority to accomplish tasks.

• A background in enhancing reliability through engineering and automation, rather than solely addressing issues manually.

• Excellent communication skills and genuine experience mentoring engineers or influencing technical decision-making within your team.

• Sound judgment regarding AI/LLM tools: you understand where they truly benefit operational workflows and where they may create noise rather than signal.

• Bonus: Familiarity with LLM-based systems, retrieval workflows, SaaS platform operations, or developing tools for support or developer teams.


🏝️ Benefits

• We strongly believe in the importance of cultivating a diverse team and welcome candidates from all backgrounds, genders, ethnicities, abilities, and sexual orientations to apply.

• We value a growth mindset, encouraging high-performing creative individuals who tackle challenges and identify opportunities for success.

• We appreciate individuals who pursue truth and speak honestly, allowing them to be their authentic selves at work.

• We recognize and support those who believe in the possibility of continuous improvement. At Domino, everything is a work in progress, and we can always enhance our efforts.

• We promote an environment of teaching and learning, equipping employees with the resources necessary for success in their roles and within the company.

People also viewed

Innovative Solutions2 hours ago

Cloud Engineer – DevOps

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$100k – $160k/year
ApplyView job
Caspar Health2 hours ago

DevSecOps/DevOps Engineer

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
IVIX2 hours ago

Deployment Engineer

US flagNew York OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Investigo12 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind12 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal12 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers