Remotery

Senior Site Reliability Engineer

Posted Jun 21

This is a fully remote position, open to applicants in New York.

📋 Description

• Ensure stability and scalability across our worldwide compute platform, which encompasses numerous data centers, various public clouds, and on-premise environments, serving as the essential foundation for all products.

• Manage and enhance our GitOps delivery model, utilizing Rancher Fleet and Flux with Helm to deploy core cluster services and application workloads in a declarative and repeatable manner.

• Create self-healing, fault-tolerant infrastructure and internal tools that eliminate repetitive operational tasks and minimize toil for both platform and application teams.

• Take ownership of cluster autoscaling and capacity strategy, including Karpenter, HPA, KEDA, and predictive scaling guided by event and calendar data.

• Establish SLOs and reliability metrics for platform components, leveraging Datadog and our logging pipeline to highlight cluster and workload health.

• Foster technical development by sharing knowledge, engaging in design discussions, and promoting a collaborative team culture, including participation in on-call rotations.


⛳️ Requirements

• Bachelor's degree in Computer Science or equivalent education, experience, and training.

• A minimum of 4 years of experience managing distributed cloud and on-premise environments at scale, with substantial hands-on experience in AWS.

• Familiarity with GCP, vSphere, or Nutanix is advantageous.

• Extensive expertise in container orchestration with Kubernetes, including the capability to design, scale, and troubleshoot intricate workloads.

• Strong background in developing software for automation and infrastructure tooling, particularly using Go and Python.

• Proficient knowledge of networking and Linux-based systems, including container runtimes like Docker and containerd, as well as packet-level debugging and kernel troubleshooting.

• Experience with Infrastructure as Code (IaC) and configuration management tools to ensure scalable and repeatable infrastructure provisioning.


🏝️ Benefits

• Bonus

• Equity

• Benefits as applicable

People also viewed

Innovative Solutions1 hour ago

Cloud Engineer – DevOps

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$100k – $160k/year
ApplyView job
Caspar Health1 hour ago

DevSecOps/DevOps Engineer

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
IVIX1 hour ago

Deployment Engineer

US flagNew York OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Investigo12 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind12 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal12 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers