This is a fully remote position, open to applicants in Kentucky.

📋 Description

• Design, implement, and sustain automation for infrastructure provisioning, configuration management, and application deployments across different environments (both on-premise and cloud).

• Actively monitor system health, performance, and availability using a variety of observability tools while defining key performance indicators (KPIs) and service level objectives (SLOs).

• Lead the analysis and resolution of intricate production incidents, conduct root cause analysis, and establish preventative measures to reduce future occurrences.

• Collaborate with development teams to ensure that software is engineered for reliability, scalability, and operational efficiency, participating in architectural reviews and providing expert advice.

• Create and maintain comprehensive incident response protocols, runbooks, and disaster recovery strategies.

• Contribute to the advancement of our SRE practices, tools, and best standards, fostering continuous improvement and knowledge sharing within the team.

• Engage in an on-call rotation to provide 24/7 support for critical production systems.

• Mentor junior SREs and assist in the growth and development of the team.

• Assess and implement new technologies and solutions to improve system reliability and operational efficiency.

⛳️ Requirements

• Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.

• Over 5 years of experience in a Site Reliability Engineering, DevOps, or closely related infrastructure engineering position.

• Strong proficiency in at least one scripting/programming language (such as Python, Go, Java, Ruby, or Bash).

• Extensive experience with cloud platforms (AWS, Azure, GCP) including services related to compute, networking, storage, and databases.

• In-depth understanding of Linux operating systems and networking basics.

• Proven experience with infrastructure as code tools (like Terraform, CloudFormation, or Ansible).

• Solid background in CI/CD pipelines and related tools (such as Jenkins, GitLab CI, or GitHub Actions).

• Demonstrated expertise in monitoring and alerting systems (for example, Prometheus, Grafana, Datadog, or Splunk).

• Strong problem-solving abilities with a structured approach to diagnosing complex distributed systems.

• Excellent communication and teamwork skills, with the capability to work effectively across cross-functional teams.

• Experience with containerization technologies (Docker, Kubernetes) is highly desirable.

• Familiarity with database technologies (both relational and NoSQL) and their operational challenges.

🏝️ Benefits

• Competitive total rewards (base salary + bonus, if applicable).

• Customizable benefits package (3 medical plans with Health Saving Account company match).

• Generous paid time off starting with 3 weeks + 13 paid holidays, including 2 personal floating holidays.

• Flexible time off for exempt team members + 13 paid holidays.

• Paid parental leave (including maternity + paternity leave).

• Education assistance opportunities and free LinkedIn Learning access.

• Free mental health and family planning programs, including adoption assistance and fertility support.

• 401(K) program with company match.

• Pet insurance.

• Employee resource groups.

Senior SRE

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Cloud Engineer – DevOps

DevSecOps/DevOps Engineer

Deployment Engineer

Senior Cloud - Kubernetes SRE

DevOps Engineer

DevSecOps Engineer

Never miss a great job!