
Senior Site Reliability Engineer, Government
Posted 16 hours ago

Posted 16 hours ago
This is a fully remote position, open to applicants in United States.
• Facilitate ongoing software delivery, manage incidents, conduct post mortems, and develop automation strategies for deployment, self-testing, and alerting.
• Lead incident management for production-related issues, ensuring swift recovery, root cause analysis, and preventive follow-up measures.
• Enhance and optimize the observability strategy by collaborating with application engineering teams to create monitoring solutions that improve alerting capabilities and minimize noise.
• Establish, implement, and track SLOs, SLIs, and SLAs in conjunction with product and engineering teams to align with business goals.
• Design, develop, and sustain software solutions that tackle operational, compliance, and pipeline challenges.
• Take ownership of and coordinate all government environment releases, driving process enhancements to boost the efficiency, reliability, and visibility of the release pipeline.
• Collaborate across functions with engineering, product, SecOps, compliance, and leadership teams to align priorities, define testing strategies, and address challenges.
• Ensure that all infrastructure and deployments comply with FedRAMP, government regulations, and industry standards, while maintaining necessary release documentation and risk assessments.
• Over 5 years of experience in SRE, DevOps, or Infrastructure Engineering for SaaS products, with a minimum of 4 years managing operations at scale.
• More than 2 years of production experience with a container orchestration system (Kubernetes preferred) and Continuous Delivery practices.
• Strong knowledge of compliance frameworks pertinent to government deployments (e.g., FedRAMP, DoD, NIST 800 53, NIST 800 137).
• Experience with multi-cloud environments in AWS/GCP (expertise in AWS preferred).
• Proven experience with at least one major programming language (Python, Go, Ruby, etc.) and proficiency in bash scripting to enhance operational workflows.
• Familiarity with GitOps frameworks, Infrastructure as Code (IaC) tools (Terraform or Pulumi), and deployment methodologies (blue green, rolling deploys, canary deploys).
• Experience with industry-standard observability stacks (Prometheus, Grafana, ELK, OpenTelemetry, etc.) and incident management procedures.
• Established background in implementing and supporting FedRAMP, security, risk management, and compliance processes for software releases.
• Experience collaborating directly with government agencies or in highly regulated sectors.
• Familiarity with testing strategies and automation within large-scale environments.
• Restricted Stock Units (RSUs)
• Employee Stock Purchase Plan (ESPP)
• Flexible time off
• Paid company holidays and paid sick time
• Gender-neutral parental leave
• Grandparent leave
• Medical, dental, and vision coverage
• 401(k) retirement plan with company match
• Life and disability insurance
• Health and dependent care FSA
• Voluntary benefits (hospital, accident, critical illness)
• Employee Assistance Program (EAP)
• ARAG pre-paid legal
• Nationwide pet insurance
• Cancer Care program
• Global business travel medical insurance
• Home office allowance
• Mobile phone reimbursement
• Wellness coach
• Wellness/gym reimbursement
• Fertility coverage
• Adoption & surrogacy reimbursement
PhoenixTeam
Grafana Labs
Pragmatike
Careflow
Get handpicked remote jobs straight to your inbox weekly.