Remotery

Principal Site Reliability Engineer, SRE

Posted Jun 20

This is a fully remote position, open to applicants in United States.

📋 Description

• Act as the main technical authority for ensuring production reliability across customer environments in the U.S.

• Analyze and resolve intricate issues involving web applications, APIs, backend services, data pipelines, cloud infrastructure, and customer integrations.

• Lead incident response for production issues, coordinating with cross-functional teams to restore services while minimizing the impact on customers.

• Conduct root cause analyses and implement corrective actions to enhance long-term system stability and resilience.

• Collaborate with software engineering and platform teams to identify recurring reliability challenges and develop sustainable solutions.

• Design, configure, and validate secure connectivity solutions for customers, including Site-to-Site VPNs, Transit Gateway integrations, routing configurations, and secure network paths.

• Assist in customer onboarding by troubleshooting connectivity issues and ensuring consistent implementation procedures.

• Improve platform observability through enhancements in monitoring, logging, alerting, tracing, and operational dashboards.

• Contribute to CI/CD, infrastructure automation, and deployment processes that enhance release safety and operational consistency.

• Create operational tools that aid in incident response, troubleshooting, onboarding, and system monitoring activities.

• Work closely with engineering leadership to enhance cloud architecture, scalability, security, and operational readiness.

• Collaborate with customer-facing teams to communicate technical challenges, remediation strategies, and reliability enhancements in a clear and effective way.

• Support initiatives related to compliance, security, and risk management within highly regulated healthcare environments.


⛳️ Requirements

• Over 6 years of hands-on experience in supporting and managing AWS-based production environments.

• At least 4 years of experience in supporting web applications and backend services (experience with Python/Django is strongly preferred).

• Proficient in AWS networking technologies such as VPCs, Site-to-Site VPNs, Transit Gateways, routing, NAT gateways, and security groups.

• Strong expertise in Terraform and infrastructure-as-code deployment methodologies.

• Experience with containerized environments, including ECS, Fargate, Kubernetes, or similar technologies.

• Proven experience in building and maintaining CI/CD pipelines and automation for release processes.

• Familiar with monitoring and observability tools like Datadog, CloudWatch, Sentry, Grafana, or similar platforms.

• Experienced in leading production incidents, managing outages, and conducting root cause analysis.

• Familiarity with Windows Server environments, Active Directory, Kerberos, and enterprise infrastructure concepts is preferred.

• Preferred experience in healthcare technology, healthcare SaaS, clinical software, or other regulated industries.

• Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related technical field is preferred.


🏝️ Benefits

• Health Care Plan (Medical, Dental & Vision)

• Retirement Plan (401k, IRA)

• Paid Time Off (Vacation, Sick & Public Holidays)

People also viewed

Innovative Solutions4 hours ago

Cloud Engineer – DevOps

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$100k – $160k/year
ApplyView job
Caspar Health4 hours ago

DevSecOps/DevOps Engineer

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
IVIX4 hours ago

Deployment Engineer

US flagNew York OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Investigo15 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind15 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal15 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers