
Senior Software Engineer – Site Reliability
Posted May 7

Posted May 7
This is a fully remote position, open to applicants in United States.
• Ensures the platform's stability, scalability, and overall performance.
• Improves product reliability by developing automated solutions for intricate infrastructure and operational issues.
• Advocates for application availability and effectiveness through active monitoring, performance optimization, and strategic enhancements.
• Conducts post-mortem analyses, creates automation to minimize operational burdens, and collaborates with product owners and developers.
• Engages in tool selection, assists with capacity planning, and establishes monitoring and alerting systems to fulfill business-defined Service Level Objectives (SLOs).
• Guides less experienced engineers to cultivate a culture of operational excellence.
• Must be at least eighteen years of age.
• Must have legal authorization to work in the United States.
• Proficient in GCP - Cloud Infrastructure.
• Familiar with Observability tools such as Grafana, Prometheus, Loki, Tempo.
• Experienced with Litmus Chaos for Destructive Testing.
• Knowledgeable in K6 for Performance Testing.
• Proficient in Terraform Enterprise for Infrastructure as Code.
• Familiar with Github for Source Control Management (SCM).
• Knowledge of CDK8S for Kubernetes Manifest as Code.
• Experienced with GH Copilot for AI development acceleration.
• Well-versed in SRE Practices including Production Readiness Review, Capacity Planning, Change Validation, and Production Support.
• At least 3 years of experience in software development.
• Health insurance
• 401(k) matching
• Flexible work hours
• Paid time off
• Remote work options
HealthEdge
Equinix
Calendly
GFT Technologies
Get handpicked remote jobs straight to your inbox weekly.