Remotery

Senior Site Reliability Engineer, SRE

Posted 6 days ago

This is a fully remote position, open to applicants in Serbia.

📋 Description

• Lead the evolution of our platform: Design and manage our Kubernetes ecosystem (GKE, multi-cluster) with a focus on ensuring high availability and zero-downtime operations.

• Develop "Paved Roads": Take ownership of our PaaS strategy, utilizing GitOps (ArgoCD) and CI/CD (GitLab) to enable domain teams to deploy autonomously.

• Architect for reliability: Establish and implement our observability strategy across metrics, logs, and tracing (Prometheus, VictoriaMetrics, OpenTelemetry).

• Champion Infrastructure-as-Code: Drive the automation of our infrastructure using Terraform, guaranteeing that all resources are standardized and version-controlled.

• Manage the Error Budget: Collaborate with engineering teams to create and oversee SLOs, SLAs, and incident management frameworks.

• Master Disaster Recovery: Design and engage in regular disaster recovery drills, employing blue/green and active/passive strategies across regions to ensure service continuity.

• Innovate Operations: Actively leverage AI-driven methodologies to enhance operational efficiency and automate bottleneck detection.


⛳️ Requirements

• Mastery of production K8s: Extensive hands-on experience managing Kubernetes (preferably GKE) in high-load, multi-cluster production settings.

• Cloud Infrastructure Expertise: Profound experience with GCP (with AWS as a significant advantage) and Terraform for large-scale infrastructure projects.

• GitOps Proficiency: Strong background in ArgoCD, GitLab CI, and the "Infrastructure as Code" philosophy.

• Observability Specialist: In-depth knowledge of the Prometheus/Grafana stack and experience in implementing tracing/logging at scale.

• System Design Skills: Demonstrated capability to design highly available 24/7 systems with automated failover and rollback functionalities.

• Fluency in English: B2+ level in English for effective cross-functional communication.


🏝️ Benefits

• Make a genuine impact on the product.

• Join our upward trajectory and grow with us. We offer resources and opportunities for continuous personal and professional development, empowering you to make a meaningful impact on our evolving product.

• Enjoy the flexibility of traveling and working remotely or in a hybrid model across Europe.

• Become a stock options holder through our Stock Options Program.

• Receive unwavering support and care, ensuring your Finom experience is successful and fulfilling.

• Immerse yourself in our exclusive Work & Swim Program in a comfortable corporate apartment in Cyprus.

• We are an Equal Opportunity Employer that values diversity in our company.

People also viewed

Advanced Solutions International, Inc.10 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone10 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers