
Reliability Operations Engineer
Posted May 19

Posted May 19
This is a fully remote position, open to applicants in Mexico.
• Oversee incident investigations during the daytime hours of your region, ensuring timely updates, appropriate escalation, and support for senior engineers leading the response efforts.
• Address escalations from Tier 1 support by utilizing established runbooks, metrics, logs, and diagnostics to resolve issues or escalate to Tier 3 when necessary.
• Revise runbooks and operational documentation in light of new issues, findings, and feedback, guaranteeing clarity and consistency throughout all procedures.
• Deliver clear and precise updates during incidents, making sure information reaches the appropriate engineering and SRE personnel while facilitating organized incident coordination.
• Engage in discussions regarding root causes, share operational insights, and contribute to process enhancements that improve system stability and supportability.
• 2–4 years of experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a similar technical support role.
• Proficiency in Linux, including system navigation, log analysis, and performing basic diagnostics.
• Capability to interpret metrics, logs, and traces using tools such as Grafana/Prometheus, Google Cloud Monitoring, and OpenTelemetry.
• Familiarity with Jira or comparable ticketing systems.
• Strong sense of ownership and accountability regarding operational responsibilities.
• Competitive salary and performance-based bonuses.
• Comprehensive health, dental, and vision insurance.
• Generous paid time off and flexible work hours.
• Opportunities for professional development and career growth.
Remote
Get handpicked remote jobs straight to your inbox weekly.