
Site Reliability Specialist – Observability, Kubernetes
Posted May 2

Posted May 2
• Lead the design, implementation, and enhancement of Everbridge’s observability stack.
• Develop and sustain a highly available and scalable observability platform.
• Standardize instrumentation, dashboards, alerts, and Service Level Objectives (SLOs).
• Assist in incident response, root cause analysis, and capacity planning.
• Operate and scale Grafana and related technologies.
• Ensure the reliability and security of EKS clusters that support observability.
• Manage the lifecycle and upgrades of clusters.
• Utilize Terraform for infrastructure provisioning.
• Implement GitLab CI/CD at scale.
• Over 6 years of experience in Site Reliability Engineering (SRE) or Platform Engineering.
• Extensive experience with the Grafana ecosystem.
• Proficient in Kubernetes and Amazon EKS.
• Strong skills in Terraform.
• Experience with OpenTelemetry is preferred.
• Familiarity with large-scale observability systems is preferred.
• Experience in cost optimization is preferred.
• Healthcare coverage.
• Dental insurance.
• Parental planning support.
• Mental health benefits.
• Disability income benefits.
• Life and Accidental Death & Dismemberment (AD&D) insurance.
• 401(k) plan with matching contributions.
• Paid time off.
• Fitness reimbursements.
Arctiq
Arctiq
Software Mind
Mediastream
Get handpicked remote jobs straight to your inbox weekly.