
Principal Site Reliability Engineer
Posted May 7

Posted May 7
• Define and spearhead the infrastructure and reliability strategy across the platform.
• Collaborate with engineering teams to design scalable and resilient systems.
• Enhance build, testing, and deployment processes to improve speed and stability.
• Establish and maintain best practices for CI/CD, monitoring, and observability.
• Lead incident response efforts and promote continuous improvement following incidents.
• Automate workflows to minimize operational toil and mitigate risk.
• Mentor engineers and cultivate a culture of operational excellence.
• Make informed strategic decisions regarding build versus buy, weighing speed, quality, and sustainability.
• A minimum of 8 years of experience in Site Reliability Engineering or DevOps roles, with at least 2 years in a Principal or Lead capacity.
• Demonstrated experience in modernizing infrastructure and scaling initiatives within high-growth environments.
• Strong expertise in Python programming.
• In-depth knowledge of cloud platforms and container orchestration tools such as AWS ECS and EKS.
• Extensive experience in designing and optimizing CI/CD pipelines using tools like GitHub Actions and Buildkite.
• Proficiency in infrastructure-as-code tools such as Terraform.
• Strong understanding of monitoring, observability, and performance optimization practices.
• Upper-Intermediate proficiency in spoken and written English.
• Healthcare coverage.
• Flexible work arrangements.
• Opportunities for professional development.
Arctiq
Arctiq
Software Mind
Mediastream
Get handpicked remote jobs straight to your inbox weekly.