
Senior Site Reliability Engineer, SRE
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in Serbia.
• Lead the evolution of our platform: Design and manage our Kubernetes ecosystem (GKE, multi-cluster) with a focus on ensuring high availability and zero-downtime operations.
• Develop "Paved Roads": Take ownership of our PaaS strategy, utilizing GitOps (ArgoCD) and CI/CD (GitLab) to enable domain teams to deploy autonomously.
• Architect for reliability: Establish and implement our observability strategy across metrics, logs, and tracing (Prometheus, VictoriaMetrics, OpenTelemetry).
• Champion Infrastructure-as-Code: Drive the automation of our infrastructure using Terraform, guaranteeing that all resources are standardized and version-controlled.
• Manage the Error Budget: Collaborate with engineering teams to create and oversee SLOs, SLAs, and incident management frameworks.
• Master Disaster Recovery: Design and engage in regular disaster recovery drills, employing blue/green and active/passive strategies across regions to ensure service continuity.
• Innovate Operations: Actively leverage AI-driven methodologies to enhance operational efficiency and automate bottleneck detection.
• Mastery of production K8s: Extensive hands-on experience managing Kubernetes (preferably GKE) in high-load, multi-cluster production settings.
• Cloud Infrastructure Expertise: Profound experience with GCP (with AWS as a significant advantage) and Terraform for large-scale infrastructure projects.
• GitOps Proficiency: Strong background in ArgoCD, GitLab CI, and the "Infrastructure as Code" philosophy.
• Observability Specialist: In-depth knowledge of the Prometheus/Grafana stack and experience in implementing tracing/logging at scale.
• System Design Skills: Demonstrated capability to design highly available 24/7 systems with automated failover and rollback functionalities.
• Fluency in English: B2+ level in English for effective cross-functional communication.
• Make a genuine impact on the product.
• Join our upward trajectory and grow with us. We offer resources and opportunities for continuous personal and professional development, empowering you to make a meaningful impact on our evolving product.
• Enjoy the flexibility of traveling and working remotely or in a hybrid model across Europe.
• Become a stock options holder through our Stock Options Program.
• Receive unwavering support and care, ensuring your Finom experience is successful and fulfilling.
• Immerse yourself in our exclusive Work & Swim Program in a comfortable corporate apartment in Cyprus.
• We are an Equal Opportunity Employer that values diversity in our company.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.