
DevOps Reliability Engineer
Posted 10 hours ago

Posted 10 hours ago
This is a fully remote position, open to applicants in Australia.
• Oversee and enhance the health, availability, performance, and cost efficiency of production systems hosted on Azure.
• Utilize telemetry from applications, databases, and infrastructure to pinpoint performance challenges, bottlenecks, and reliability threats.
• Adjust Azure services and platform settings to optimize performance, resilience, and resource utilization.
• Collaborate with engineering teams to propose and execute practical, data-driven enhancements to reliability, scalability, and operational efficiency.
• Develop and maintain operational documentation, runbooks, and troubleshooting guides to facilitate consistent incident response and ongoing operations.
• Assist Tech Support and Sustained Engineering by executing authorized SQL queries and conducting database backups and restores for troubleshooting purposes.
• Assess how partner integrations and customer usage trends influence system performance and cloud expenditure.
• Explore intricate production issues, perform root cause analysis, and lead the resolution of reliability and performance challenges.
• Contribute to the ongoing enhancement of deployment processes, system stability, and operational readiness.
• Carry out other job-related tasks and responsibilities as assigned.
• Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent experience.
• 8+ years of experience in DevOps, Site Reliability Engineering, Cloud Engineering, or comparable roles.
• Extensive hands-on experience with Microsoft Azure, particularly Azure SQL, Azure Functions, Azure App Services, and Azure Containers (AKS, Container Apps, or similar).
• Proficient in reading and interpreting telemetry, logs, metrics, and resource usage data, and articulating issues along with solutions.
• Experience with production systems that demand high availability and reliability.
• Comfortable managing work from start to finish, from identifying issues to implementing improvements.
• Familiarity with adjusting pipelines, hosting configurations, and deployment workflows.
• Strong understanding of cloud cost drivers and optimization of usage.
• Excellent problem-solving abilities and the capacity to work collaboratively with engineering and support teams.
• Capability to read and interpret application code to assist with troubleshooting, root cause analysis, and identifying opportunities for performance enhancements.
• Wellness Benefits
• Opportunities for Professional Growth and Development
• Flexible Remote Work
• Volunteer Time Off
• Study Leave
• Employee Assistance Program
Stone
Replit
Soum
Lakeside Software
Get handpicked remote jobs straight to your inbox weekly.