Remotery

Senior DevOps Engineer/Site Reliability Engineer

Posted Jun 2

This is a fully remote position, open to applicants in New York.

📋 Description

• Oversee and sustain Kubernetes clusters along with containerized applications.

• Manage cloud infrastructures across environments such as OCI, AWS, GCP, or Azure.

• Design and uphold CI/CD pipelines to ensure dependable application deployments.

• Implement and administer Infrastructure as Code (IaC) utilizing Terraform and Helm.

• Create automation tools and operational processes using Python, Go, or Bash.

• Propel observability initiatives, including enhancements in monitoring, logging, tracing, and alerting.

• Track, troubleshoot, and address production incidents while engaging in on-call rotations.

• Support and enhance distributed data platforms including Kafka, Elasticsearch, Spark, Redis, and MongoDB.

• Boost platform reliability, scalability, and operational effectiveness through SRE best practices.

• Collaborate with cross-functional teams across various time zones.

• Conduct Linux system administration and networking troubleshooting.

• Participate in incident response processes, postmortems, and reliability enhancements.

• Assist in GitOps and deployment workflows with tools like ArgoCD and GitHub Actions.

• Assess and adopt AI-assisted operational tools for auto-remediation, alert correlation, and operational intelligence.


⛳️ Requirements

• Over 5 years of experience in DevOps, SRE, or Platform Engineering roles.

• Strong proficiency in Kubernetes, Docker, and container orchestration.

• Practical experience in managing production cloud environments.

• Robust knowledge of Infrastructure as Code with Terraform and Helm.

• Experience with CI/CD tools and deployment automation practices.

• Advanced troubleshooting capabilities in Linux systems, networking, and distributed systems.

• Familiarity with observability platforms such as Prometheus, Grafana, Loki, Alertmanager, and Elastic Stack.

• Strong programming and scripting capabilities in Python, Bash, or Go.

• Background in supporting high-availability production systems and on-call operations.

• Knowledge of incident management and reliability engineering methodologies.

• Understanding of data platform technologies like Kafka, Spark, Elasticsearch, Redis, or MongoDB.

• Awareness of AI-driven operational tools and automated remediation strategies.

• Excellent communication, collaboration, and problem-solving abilities.

• Must reside on the East Coast.


🏝️ Benefits

• Pre-IPO Stock Options

• Medical, Dental & Vision care

• 401(k)

• Employee Assistance Program

• Employee Discount Program

• Life Insurance

• Paid time off

• Referral Program

• Rewards and Recognition Program

People also viewed

Ad Hoc LLC2 days ago

Senior Site Reliability Engineer

North AmericaFull-timeDevOps & Site Reliability Engineer (SRE)$135k – $150k/year
ApplyView job
Acuity, Inc.3 days ago

Senior DevOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$135k – $150k/year
ApplyView job
NICE4 days ago

Senior Cloud Operations Engineer

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Grafana Labs4 days ago

Staff Software Engineer – Databases SRE

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)€109.7k – €131.7k/year
ApplyView job
Castillians5 days ago

DevOps Engineer

IE flagIreland OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
CodiLime6 days ago

Senior DevOps Engineer

EG flagEgypt OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers