
Senior Site Reliability Engineer
Posted 18 hours ago

Posted 18 hours ago
• Establish the strategy for Service Level Objectives (SLOs) and Error Budgets.
• Create intricate telemetry pipelines for comprehensive full-stack observability.
• Design and oversee the enterprise standards for Infrastructure as Code (IaC).
• Build custom tools to automate complex recovery processes and system scaling.
• Serve as the Incident Commander during major system outages, leading the technical response and managing the Root Cause Analysis (RCA) process.
• Spearhead the integration of security-as-code within DevSecOps pipelines, ensuring adherence to RMF and NIST 800-53 standards.
• Offer technical guidance and mentorship to Mid-Level SREs and developers, promoting a culture of reliability throughout the organization.
• Over 7 years of experience in SRE or DevOps, with a strong focus on distributed systems.
• Proficiency in Go, Python, or Java, along with advanced knowledge of Linux internals.
• Significant experience managing production Kubernetes environments and complex cloud architectures.
• Demonstrated ability to define and achieve SLOs for high-availability systems.
• Familiarity with government Risk Management Framework (RMF) processes.
• Education: Bachelor’s or Master’s degree in Computer Science or Engineering.
• Certifications: CKA (Certified Kubernetes Administrator) and industry observability certification preferred.
• Competitive salary and performance-based bonuses.
• Comprehensive health, dental, and vision insurance.
• Flexible working hours and remote work options.
• Opportunities for professional development and continuous learning.
Arctiq
Software Mind
Mediastream
Kyndryl
Get handpicked remote jobs straight to your inbox weekly.