Remotery

Senior Site Reliability Engineer, SRE

Posted 6 days ago

This is a fully remote position, open to applicants in Spain.

📋 Description

• Lead the Platform Evolution: Design and manage our Kubernetes ecosystem (GKE, multi-cluster) with an emphasis on high availability and operations without downtime.

• Build "Paved Roads": Take ownership of and enhance our PaaS strategy, utilizing GitOps (ArgoCD) and CI/CD (GitLab) to enable domain teams to deploy autonomously.

• Architect Reliability: Define and implement our observability strategy encompassing metrics, logs, and tracing (Prometheus, VictoriaMetrics, OpenTelemetry).

• Drive Infrastructure-as-Code: Spearhead the automation of our infrastructure using Terraform, ensuring all resources adhere to standards and are version-controlled.

• Own the Error Budget: Collaborate with engineering teams to establish and oversee SLOs, SLAs, and frameworks for incident management.

• Disaster Recovery Mastery: Design and engage in regular DR drills, implementing blue/green and active/passive strategies across regions to maintain service continuity.

• Innovate Operations: Actively apply AI-driven strategies to enhance operational efficiency and automate the detection of bottlenecks.


⛳️ Requirements

• Production K8s Mastery: Extensive hands-on experience managing Kubernetes (GKE preferred) in high-load, multi-cluster production settings.

• Cloud Infrastructure: Profound experience with GCP (AWS is a significant advantage) and Terraform for large-scale infrastructure projects.

• GitOps Expertise: Comprehensive experience with ArgoCD, GitLab CI, and the principles of 'Infrastructure as Code'.

• Observability Expert: In-depth understanding of the Prometheus/Grafana stack and implementing tracing/logging at scale.

• System Design: Demonstrated capability to design highly available systems operating 24/7 with automated failover and rollback functionalities.

• English Fluency: Proficient in English at a B2+ level for effective communication across functions.


🏝️ Benefits

• Make a genuine impact on the product.

• Join our upward trajectory, and grow with us.

• Work in the EU.

• Enjoy the flexibility of traveling and working remotely or in a hybrid model across Europe.

• Become a stock options holder.

• Unlock your inner entrepreneur and align your aspirations with ours through our Stock Options Program.

• Receive unwavering support and care.

• Constant support and care to ensure your Finom experience is successful and fulfilling.

• Work & Swim program.

• Spend one month in a comfortable corporate apartment in enchanting Cyprus.

• Equal Opportunity Statement.

• We embrace diversity and invite applications from all walks of life.

People also viewed

Experian41 min ago

SRE Specialist

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
In All Media41 min ago

Azure DevOps Engineer, ML Ops Engineer

Latin AmericaFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job
knowmad mood41 min ago

Backend Developer, PHP, React, DevOps

ES flagSpain OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Work Life Group1 hour ago

Lead DevOps Engineer, Data & AI Platform

HU flagHungary OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
accesa.eu1 hour ago

DevOps Engineer, German

RO flagRomania OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cisco1 hour ago

Site Reliability Engineer – Kubernetes Platform

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers