This is a fully remote position, open to applicants in Germany.

📋 Description

• As a Site Reliability Engineer on our Platform Squad, you will be instrumental in maintaining the speed, resilience, and scalability of Flip's infrastructure.

• You will cultivate a culture of reliability, develop tools, and establish practices that empower our engineering teams to deploy confidently—at scale and without sacrificing availability.

• This position is perfect for an engineer who is enthusiastic about constructing high-throughput, highly available systems and wishes to influence the operations of a rapidly growing SaaS platform in production.

• Enable scaling: enhance and refine our cloud infrastructure on Azure and our Kubernetes clusters—engineered for high throughput and optimal availability—to facilitate Flip’s swift global expansion.

• Ensure resilience & security: architect and implement zero-downtime deployments, rollback protocols, and disaster recovery strategies that guarantee our platform's availability around the clock.

• Build observability: advance our LGTM stack (Loki, Grafana, Tempo, Mimir) to grant every team the visibility they require—and utilize it to define and enhance our SLOs.

• Automate everything: design, develop, and refine Infrastructure as Code using Pulumi in Go to eliminate manual tasks and provide our platform to engineering teams as a self-service solution.

• Drive reliability practices: advocate for CI/CD best practices, incident management, post-mortems, and enhance the developer experience throughout the engineering organization.

• Shape our roadmap: collaborate with your squad and engineering leadership to outline the platform's direction—from scalable, high-throughput systems and cost optimization to security posture and compliance.

⛳️ Requirements

• 1–3 years of practical experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong emphasis on infrastructure.

• Proficiency in operating and scaling cloud infrastructures (Azure, GCP, AWS).

• Extensive knowledge of Kubernetes and container orchestration in production settings.

• Practical experience with modern observability stacks (e.g., Prometheus, Mimir, Loki, ELK) and familiarity with defining and managing SLOs and error budgets.

• Strong software development skills in Go (preferred, since our IaC operates on Pulumi in Go), Python, or Kotlin.

• Practical experience with Infrastructure as Code (e.g., Pulumi, OpenTofu, Terraform) and configuration management tools (e.g., Ansible, Chef).

• A collaborative mindset, excellent communication skills, and business-fluent English.

• Willingness to participate in on-call rotations to ensure the reliability of our platform.

🏝️ Benefits

• Work mode: We are remote-first, providing you the flexibility to work from home, while also valuing the advantages of in-person collaboration. Depending on the role, you may occasionally attend team events, workshops, or meetings at our offices in Berlin or Stuttgart—always with ample notice. The exact balance will be discussed transparently during your application process.

• Work–life balance: We don’t want you to be tethered to your desk, so we cover the cost of your E-Gym/Wellpass membership and offer company bike leasing (JobRad).

• Celebrate successes: You’ll collaborate with highly motivated, dedicated individuals in a relaxed working environment.

• Be in the action: You will actively influence Flip. Throughout this journey, you’ll support the rapid growth of a young tech company and evolve alongside your goals. A positive atmosphere is guaranteed.

• Happy to be a Flipster: Anticipate regular team events and Culture Days that foster camaraderie among Flipsters.

• Work abroad: At Flip, you also have the option to work from other European countries—let’s discuss workation during the interview.

Site Reliability Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Lead DevOps Engineer, Data & AI Platform

DevOps Engineer, German

Site Reliability Engineer – Kubernetes Platform

Lead DevOps Engineer – Data & AI Platform

Security Engineer, DevSecOps

Cloud Operations Engineer

Never miss a great job!