
Senior Site Reliability Engineer
Posted May 19

Posted May 19
This is a fully remote position, open to applicants in Australia.
• Create and sustain observability solutions utilizing platforms such as Datadog, Prometheus, and Grafana.
• Assume a leadership role in incident management, which includes coordinating response efforts, diagnosing issues, and determining follow-up actions.
• Collaborate with product engineering teams to design dependable systems, recover from incidents, and derive lessons from mistakes.
• Work alongside teams to establish and uphold SLOs, monitoring, and alerting strategies that guarantee reliability at scale.
• Develop and implement automation and support tools to enhance system resilience, ensure operational safety, and minimize operational overhead.
• Oversee the creation and upkeep of runbooks, alert definitions, and incident response protocols.
• Engage in on-call rotations to provide 24/7 support for critical production systems.
• A minimum of 6 years of experience in Site Reliability Engineering or comparable DevOps positions focused on system reliability and incident management.
• Extensive experience with contemporary monitoring stacks including Prometheus, Grafana, and Datadog.
• Proficiency in at least one systems programming language, such as Python, Go, Rust, C/C++, or Java.
• Mastery of Infrastructure as Code tools, including Terraform and Helm.
• Familiarity with at least one major cloud service provider (AWS, GCP, Azure).
• Strong communication skills, capable of leading incident responses and effectively collaborating across teams.
• Willingness and experience in participating in on-call rotations and emergency response processes.
• A high degree of autonomy and a proactive approach to identifying and resolving issues.
• Exceptional problem-solving abilities and a systematic approach to troubleshooting complex challenges.
• Health, dental, vision, life, and disability insurance.
• 401(k) plan and flexible spending accounts.
• Flexible time off.
• Option to work from the Atlanta or San Francisco offices.
Work Life Group
accesa.eu
Cisco
Work Life Group
Get handpicked remote jobs straight to your inbox weekly.