
Senior Site Reliability Engineer
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in Virginia.
• Establish the strategy for Service Level Objectives (SLOs) and Error Budgets.
• Create intricate telemetry pipelines to achieve comprehensive full-stack observability.
• Design and oversee the enterprise Infrastructure as Code (IaC) standards.
• Build custom tools to automate intricate recovery processes and system scaling.
• Serve as the Incident Commander during significant system outages, leading the technical response and steering the Root Cause Analysis (RCA) process.
• Spearhead the integration of security-as-code within DevSecOps pipelines, ensuring adherence to RMF and NIST 800-53 standards.
• Offer technical guidance and mentorship to Mid-Level SREs and developers, promoting a culture of reliability throughout the organization.
• Over 7 years of experience in SRE or DevOps, with substantial exposure to distributed systems.
• Proficiency in Go, Python, or Java along with advanced understanding of Linux internals.
• Significant experience in managing production Kubernetes environments and complex cloud infrastructures.
• Demonstrated success in defining and achieving SLOs for high-availability systems.
• Familiarity with government Risk Management Framework (RMF) processes.
• Education: Bachelor’s or Master’s degree in Computer Science or Engineering.
• Certifications: CKA (Certified Kubernetes Administrator) and relevant industry observability certification preferred.
• Comprehensive health, dental, and vision insurance.
• Flexible work hours and remote working options.
• Professional development opportunities and training programs.
• Generous vacation and paid time off policies.
• 401(k) plan with company matching.
Urrly
Weiler Abrasives Group
Abbott
Segoso
Get handpicked remote jobs straight to your inbox weekly.