This is a fully remote position, open to applicants in Germany.

📋 Description

• Co-owner of the architecture: Collaborate in shaping the architecture and development of our cloud infrastructure on Azure and our Kubernetes clusters — engineered for high throughput and optimal availability — to facilitate Flip’s swift global expansion.

• Drive the resilience strategy: Establish our methodology for global scaling, zero-downtime deployments, rollback procedures, and disaster recovery, ensuring the platform maintains 24/7 availability.

• Evolve our observability stack: Enhance our LGTM stack (Loki, Grafana, Tempo, Mimir) into a reliable foundation for engineers.

• Improve our IaC platform: Reduce operational toil at its origin and transform our infrastructure into a genuine self-service option for engineering teams.

• Incident leadership: Take charge during significant platform incidents, conduct blameless post-mortems, and translate insights into enduring enhancements.

• Squad mentoring: Guide team members, lead RFCs and design reviews within the squad, and assist engineers in evolving into more proficient SREs.

• Shape our roadmap: Collaborate with your squad to establish the future direction of the platform.

⛳️ Requirements

• 5+ years of practical experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong emphasis on infrastructure.

• Demonstrated history of constructing and managing high-throughput, highly available production systems.

• Extensive production-level experience with Kubernetes on a major hyperscaler.

• Solid experience with modern observability stacks (e.g., Prometheus, Mimir, VictoriaMetrics, dashboards, Loki, ELK) and a clear understanding of SLIs, SLOs, and error budgets.

• Strong software development capabilities in Go (preferred, as our IaC is built with Pulumi in Go) or Python.

• Practical experience with Infrastructure as Code (Pulumi, OpenTofu, Terraform) and GitOps (e.g., Argo CD) along with CI/CD pipeline design.

• Proven capacity to lead intricate infrastructure initiatives from design through to production — including authoring RFCs and steering architectural decisions within the team.

• Experience in mentoring engineers and enhancing the technical proficiency within a team.

• Confident in taking end-to-end ownership during critical incidents and the ability to translate learnings into sustainable technical advancements.

• Strong communication skills and business-fluent English.

• Willingness to participate in on-call rotations to maintain the reliability of our platform.

🏝️ Benefits

• Work mode: We embrace a remote-first approach, allowing you the flexibility to work from home. While we appreciate the advantages of in-person collaboration, you may occasionally attend team events, workshops, or meetings at our offices in Berlin or Stuttgart — always with prior notice. The precise balance will be openly discussed during the hiring process.

• Work-life balance: We want you away from your desk, so we cover the cost of an EGYM Wellpass membership and provide company bike leasing (JobRad).

• Celebrate success: Join a team of highly motivated and engaged colleagues in a relaxed work environment.

• Hands-on impact: Take an active role in shaping Flip and contribute to the rapid growth of a young tech company while progressing towards your own goals. Positive vibes guaranteed.

• Happy to be a Flipster: Enjoy regular team events and Culture Days that foster camaraderie among us Flipsters.

• Working abroad: At Flip, you have the opportunity to work from other European countries — let's discuss workation options during the interview.

Senior Site Reliability Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

DevOps Reliability Engineer

Senior Site Reliability Engineer – Network

Staff Site Reliability Engineer

DevOps Engineer, Mid Level

DevOps Engineer, Azure

DevOps Engineer, mk8s

Never miss a great job!