
Site Reliability Engineer
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in Kansas.
• Acquire a thorough understanding of VDC workloads, dependencies, and operational processes by reviewing code, documentation, and collaborating with subject matter experts (SMEs).
• Develop and keep updated runbooks, incident guides, and operational documentation.
• Aid in knowledge transfer and assist in the creation of onboarding materials for the team.
• Engage in incident response activities, including triage, investigation, mitigation, and postmortems.
• Assist in the implementation and upkeep of SLIs, SLOs, and error budgets as defined by the team.
• Identify reliability challenges during incidents or reviews and suggest specific improvements.
• Contribute to high availability and fault tolerance initiatives on Azure, including Azure Government.
• Bridge monitoring gaps by setting up instrumentation, alerting, and dashboards in line with team standards.
• Help reduce toil through automation and improvements in tooling.
• Take part in on-call rotations.
• Work with Infrastructure as Code (IaC), CI/CD pipelines, and deployment tools in compliance-restricted settings.
• Assist in testing, canary deployments, and validation workflows for releases.
• Implement infrastructure and configuration changes according to established patterns and review protocols.
• Collaborate with engineering, security, compliance, and operations teams to achieve reliability enhancements.
• Clearly communicate system behavior, risks, and status—both in writing and during meetings.
• Proactively identify blockers and gaps; do not wait for issues to escalate.
• Over 3 years of experience in Software Engineering, with a minimum of 1 year in SRE, Platform Engineering, or DevOps focused on cloud-hosted services.
• Proficient in cloud infrastructure on Azure or a similar cloud provider.
• Knowledge of regulated or compliance-focused environments such as government (FedRAMP, CMMC), financial (PCI-DSS), or healthcare (HIPAA). You recognize how compliance impacts operational capabilities.
• Competent in reading and comprehending code well enough to investigate system behavior autonomously.
• Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry, ELK stack).
• Experience using IaC tools (Terraform, Terragrunt, or Pulumi) and container orchestration (Kubernetes).
• Knowledge of CI/CD tools such as GitHub Actions, Azure DevOps, GitLab CI, or ArgoCD.
• Strong programming abilities in one or more languages: TypeScript/JS, Go, Java, C#, or similar.
• Solid understanding of the principles of distributed systems and basic networking.
• Excellent written and verbal communication skills.
• Unlimited paid time off, 12 paid holidays including 4 global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares.
• Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents.
• Medical, dental, and vision coverage starting on your first day.
• Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program.
• 401(k) retirement plan with company matching contributions.
• Fertility, adoption, and surrogacy assistance through Maven, plus paid volunteer time.
• AirVet: 24/7 virtual veterinary care at no cost.
• Legal services, identity protection, and additional health insurance options.
• Tax-advantaged spending accounts for healthcare, dependent care, and commuting.
• Opportunities for learning and growth through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events such as our annual Global Day of Learning.
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.