
Senior Site Reliability Engineer
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in New Mexico.
• Collaborate with software developers, platform engineers, and IT personnel to enhance system design, operational efficiency, deployment safety, and readiness for production support.
• Establish and uphold operational standards, create runbooks, support procedures, escalation pathways, and service-level objectives.
• Assess system architecture and modifications to ensure they meet functional requirements while balancing service quality, reliability, security, and compliance needs.
• Promote ongoing improvements in platform stability, maintenance, and availability.
• Deliver advanced technical support and troubleshooting for intricate platform and service issues impacting internal users and stakeholders.
• Over 8 years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Systems Engineering, or similar infrastructure roles that support production services.
• Extensive experience in Linux systems administration and troubleshooting within enterprise environments.
• Significant experience in operating and maintaining on-prem Kubernetes platforms and all related components, including CRI, CNI, and CSI plugins.
• Proficient in deploying and managing applications on Kubernetes using tools such as Helm, Kustomize, and others.
• Familiarity with DevOps tools like GitLab, Artifactory, Jira, and Confluence.
• Experience with GitOps tools such as FluxCD or ArgoCD.
• Skilled in scripting with at least one of the following: Python, Go, or Bash.
• Strong background in designing, maintaining, and enhancing observability tools including monitoring, dashboards, logging, tracing, and supporting SLOs.
• Solid understanding of reliability engineering principles: service health indicators, high availability design, failure reduction and testing, operational readiness practices, including documentation development, runbooks, and architectural descriptions, incident response, root cause analysis, and remediation/recovery.
• Capability to obtain a security clearance, which necessitates U.S. citizenship.
• Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities
N2JSoft, administrative and HR softwares
It's Prodigy
Kenlo
Ad Hoc LLC
Get handpicked remote jobs straight to your inbox weekly.