
Site Reliability Lead
Posted 2 days ago

Posted 2 days ago
This is a fully remote position, open to applicants in United Kingdom.
• Define and direct system architecture while balancing trade-offs among speed, scalability, maintainability, and security to achieve business objectives.
• Advocate for accountability from design through to production by ensuring systems are observable and meet established Service Level Objectives (SLOs). Promote continuous enhancement in platform reliability, performance, and efficiency.
• Lead Root Cause Analysis (RCA) efforts when problems arise and assist in optimizing the incident response process and framework.
• Propel automation initiatives within the team to minimize operational toil and enhance system efficiency.
• Maintain coding standards, encourage automated testing, and collaborate with the architecture community to promote technology adoption while sharing best practices across teams. Ensure production readiness standards for all services.
• Oversee technical estimation and feasibility assessments, ensuring that plans are realistic and correspond with team capacity. Participate in structured release planning and support post-release reviews.
• Mentor and coach engineers through constructive feedback, knowledge sharing, and motivation. Cultivate alignment and assist the team in rallying around technical solutions and objectives.
• Collaborate closely with Product Managers, Engineering Managers, and other engineers to align technical direction with product strategy. Clearly communicate complex technical concepts to both technical and non-technical stakeholders.
• Extensive professional experience in SRE, DevOps, or Platform Engineering involving complex, scalable systems.
• In-depth expertise in AWS and distributed cloud architectures.
• Proven track record of operating platforms that handle a high volume of requests (~1000 req/sec).
• Advanced skills in Terraform and configuration management tools.
• Strong programming skills in Python, Go, or a similar language for automation and tooling.
• Significant experience with monitoring and observability platforms (e.g., DataDog, Prometheus, or equivalent), along with incident/problem management.
• Expert knowledge of distributed systems, microservices, and resilience patterns.
• Practical experience with containerization and orchestration technologies (Docker, Kubernetes, ECS).
• Demonstrated experience in building and maintaining CI/CD pipelines for automated deployments.
• Proven ability to mentor and support the professional growth of fellow engineers.
• Bonus Skills
• Experience with chaos engineering and reliability testing.
• Familiarity with security best practices and compliance frameworks.
• Background in agile and lean methodologies (Scrum/Kanban).
• Contributions to open-source projects or the SRE community.
• A dedicated wellbeing team that promotes initiatives such as mindfulness, lunch and learns, manager training, mental health first aid training, and much more!
• 32 days of holiday (plus Bank Holidays), consisting of 25 days of annual leave plus 7 additional company-wide days given during Easter, Summer, and Christmas.
• Life Assurance provided at 3x annual salary.
• Comprehensive wellness benefits offered by AIG Smart Health, which includes a 24/7 virtual GP service, mental health support, counseling, and personalized health checks.
• Private Dental Insurance through Bupa.
• Salary sacrifice Pension plan provided by Scottish Widows.
• Enhanced maternity and adoption leave (20 weeks full pay) and paternity leave (6 weeks full pay).
• Five complimentary return-to-work maternity coaching sessions to assist you in adapting to this exciting new phase of life!
• Access to services such as Calm and Bippit for financial wellbeing coaching.
• All our roles support flexible working arrangements, and we are open to discussing what this means for you.
• Social committees that organize team, office, and company-wide events to foster connections and celebrate achievements.
• A dedicated professional development training budget for CPD courses, upskilling resources, and professional memberships.
• Opportunity to volunteer with a charity of your choice for one day each year.
• Dog-friendly offices!
Cision France
Navigate Power
Get handpicked remote jobs straight to your inbox weekly.