
DevOps Reliability Engineer
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in United Kingdom.
• Oversee and enhance the health, availability, performance, and cost-effectiveness of production systems based on Azure.
• Utilize application, database, and infrastructure telemetry to pinpoint performance challenges, bottlenecks, and reliability threats.
• Optimize Azure services and platform settings to achieve maximum performance, resilience, and resource efficiency.
• Collaborate with engineering teams to suggest and implement actionable, data-informed enhancements to reliability, scalability, and operational efficiency.
• Develop and uphold operational documentation, runbooks, and troubleshooting guides to ensure consistent incident response and ongoing operations.
• Assist Tech Support and Sustained Engineering by running approved SQL queries and performing database backups and restorations for troubleshooting purposes.
• Evaluate the impact of partner integrations and customer usage patterns on system performance and cloud expenditures.
• Investigate intricate production issues, conduct root cause analysis, and facilitate the resolution of reliability and performance challenges.
• Contribute to continuous enhancement in deployment procedures, system stability, and operational preparedness.
• Carry out additional job-related tasks and responsibilities as assigned.
• Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent experience.
• Over 8 years of experience in DevOps, Site Reliability Engineering, Cloud Engineering, or comparable positions.
• Extensive hands-on experience with Microsoft Azure, particularly with: Azure SQL, Azure Functions, Azure App Services, and Azure Containers (AKS, Container Apps, or similar).
• Proficient in reading and interpreting telemetry, logs, metrics, and resource usage data, with the ability to diagnose issues and propose solutions.
• Experience working with production systems that demand high availability and reliability.
• Comfortable managing tasks from start to finish, from issue identification to implementing improvements.
• Familiarity with adjusting pipelines, hosting configurations, and deployment workflows.
• Solid understanding of cloud cost determinants and strategies for usage optimization.
• Strong problem-solving abilities and the capability to collaborate effectively with both engineering and support teams.
• Ability to read and interpret application code to aid in troubleshooting, root cause analysis, and the identification of performance enhancement opportunities.
• Wellness Benefits
• Opportunities for Professional Growth and Development
• Flexible Remote Work
• Volunteer Time Off
• Study Leave
• Employee Assistance Program
Investigo
Software Mind
Cherokee Federal
Avaya
Get handpicked remote jobs straight to your inbox weekly.