This is a fully remote position, open to applicants in Pennsylvania.

📋 Description

• Schedule and execute large-scale batch workloads across Kubernetes clusters.

• Diagnose and troubleshoot job failures for clients.

• Work collaboratively with teams throughout the organization to comprehend workload requirements and enhance platform capabilities.

• Enhance the reliability and speed of our systems and processes by increasing automation.

• Document procedures to create a detailed library of runbooks, serving as a knowledge base and foundation for automation.

• Participate in an on-call rotation to maintain the SLOs and SLAs of production services.

• Contribute to platform tooling, automation, and CI/CD workflows.

⛳️ Requirements

• A solid understanding of Linux operating system internals, TCP/IP networking, and storage subsystems.

• Extensive experience with Kubernetes and container orchestration in production-grade environments.

• Knowledge of engineering design limitations and the ability to advise teams on scaling their services to meet performance goals within budget.

• Strong experience in implementing and troubleshooting cloud-native and open-source tools like Kubernetes, etcd, Prometheus, and OpenTelemetry.

• Excellent communication skills and the capability to work efficiently in a diverse and distributed team.

🏝️ Benefits

• We are proud to be an equal opportunity workplace.

• We believe that diverse teams produce the best ideas and outcomes.

• We are committed to fostering a culture of inclusion, entrepreneurship, and innovation across gender, race, age, sexual orientation, religion, disability, and identity.

Staff Site Reliability Engineer

United States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$136.3k – $170k/year

7 hours ago

Apply

May Mobility7 hours ago

May Mobility

Autonomy Release Engineer II

Michigan OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$128k – $165k/year

7 hours ago

Apply

Practical DevSecOps7 hours ago

Practical DevSecOps

Senior Security Engineer, Content Engineering

California OnlyFull-timeDevOps & Site Reliability Engineer (SRE)

7 hours ago

Apply

High 5 Games7 hours ago

High 5 Games

DevOps Engineer – ML & Data Infrastructure

United States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)

7 hours ago

Apply

Mercury Insurance7 hours ago

Mercury Insurance

Manager – Site Reliability Operations

California OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$118.7k – $230.6k/year

7 hours ago

Apply

Ad Hoc LLC7 hours ago

Ad Hoc LLC

Senior Site Reliability Engineer

North AmericaFull-timeDevOps & Site Reliability Engineer (SRE)$135k – $150k/year

7 hours ago

Apply

Site Reliability Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Staff Site Reliability Engineer

Autonomy Release Engineer II

Senior Security Engineer, Content Engineering

DevOps Engineer – ML & Data Infrastructure

Manager – Site Reliability Operations

Senior Site Reliability Engineer

Never miss a great job!