Remotery

Senior Site Reliability Engineer

Posted May 31

This is a fully remote position, open to applicants in Bulgaria.

📋 Description

• Lead the evolution of our platform by designing and managing our Kubernetes ecosystem (GKE, multi-cluster) with an emphasis on high availability and no-downtime operations.

• Develop "Paved Roads": Take ownership of our PaaS strategy, utilizing GitOps (ArgoCD) and CI/CD (GitLab) to enable domain teams to deploy independently.

• Architect for reliability: Establish and execute our observability strategy across metrics, logs, and tracing (Prometheus, VictoriaMetrics, OpenTelemetry).

• Champion Infrastructure-as-Code: Spearhead the automation of our infrastructure with Terraform, ensuring all resources are standardized and version-controlled.

• Manage the Error Budget: Collaborate with engineering teams to define and oversee SLOs, SLAs, and frameworks for incident management.

• Master Disaster Recovery: Design and engage in regular DR drills, employing blue/green and active/passive strategies across regions to guarantee service continuity.

• Innovate Operations: Actively implement AI-driven methodologies to enhance operational efficiency and automate bottleneck detection.


⛳️ Requirements

• Extensive hands-on experience in managing Kubernetes (preferably GKE) in high-load, multi-cluster production settings.

• In-depth experience with GCP (AWS is a significant plus) and Terraform for large-scale infrastructure management.

• Strong familiarity with ArgoCD, GitLab CI, and the "Infrastructure as Code" approach.

• Profound knowledge of the Prometheus/Grafana stack and experience in implementing tracing/logging at scale.

• Demonstrated capability to design highly available 24/7 systems with automated failover and rollback functionalities.

• English proficiency at B2+ level for effective cross-functional communication.


🏝️ Benefits

• Make a genuine impact on the product.

• Work in the EU.

• Become a holder of stock options.

• Receive unwavering support and care.

• Participate in the Work & Swim program.

• Equal Opportunity Statement.

People also viewed

Advanced Solutions International, Inc.10 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone10 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers