Remotery

Principal Service Reliability Engineer

Posted Jun 20

This is a fully remote position, open to applicants in Virginia, +4 more states.

📋 Description

• Develop and enhance build and deployment pipelines for secure, dependable production releases.

• Oversee the design and management of pre-production and production cloud infrastructure to guarantee high availability, performance, and security.

• Collaborate with Engineering teams to streamline the release flow from development to testing and production.

• Establish and enforce monitoring, alerting, and incident response protocols.

• Direct intricate troubleshooting efforts, conduct root cause analysis, and promote systematic post-incident enhancements.

• Assess, recommend, and integrate new infrastructure technologies and services.

• Ensure platforms comply with or surpass healthcare security and compliance standards.

• Promote and facilitate the adoption of SRE best practices (SLOs, SLIs, error budgets, reliability engineering standards).

• Act as a technical leader and advisor across Service Reliability, DevOps, and engineering teams.

• Guide engineers through design reviews, knowledge sharing, and best practice advisement.

• Influence system design and architectural choices to enhance scalability and resilience.

• Collaborate with teams to prioritize reliability initiatives and minimize operational risk.

• Assist in defining engineering standards, best practices, and operational runbooks.

• Cultivate a culture of ownership, accountability, reliability, and continuous improvement.


⛳️ Requirements

• Bachelor’s degree in Computer Science, Computer Engineering, Information Security, or a related field with practical experience.

• Over 8 years of experience in SRE, DevOps, or infrastructure engineering positions.

• Extensive expertise in managing cloud-based infrastructure, preferably in Azure.

• Proficient experience with Kubernetes, encompassing: Cluster setup, networking, access control, and authorization.

• Knowledge of deployments, services, config maps, secrets, and cronjobs.

• Designing, deploying, and maintaining service mesh infrastructure.

• Strong experience with GitHub Actions and CI/CD pipelines.

• Experience in supporting production environments and high-availability systems.

• Familiarity with Agile methodologies (Scrum, sprints, backlogs).

• Experience managing certificates, secrets, and monitoring systems.

• Excellent collaboration skills within a large, evolving engineering organization.

• Proven ability to lead complex technical initiatives across multiple teams.

• Proactive approach with a focus on continuous improvement.


🏝️ Benefits

• Flexible time off, including 12 paid holidays.

• 401k match along with 100% employer-paid medical, dental, and vision premiums.

• Company contributions to Health Savings Account.

• Stock options.

People also viewed

Investigo10 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind10 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal10 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job
Avaya10 hours ago

Site Reliability Engineer – Azure, DevSecOps, IaC, Governance, Observability

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$129k – $143k/year
ApplyView job
Agilent Technologies10 hours ago

DevOps Engineer – Platform, AWS, CI/CD

US flagColorado OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$143.8k – $224.6k/year
ApplyView job
Dropbox10 hours ago

Site Reliability Engineer

PL flagPoland OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers