
Senior Site Reliability Engineer
Posted May 31

Posted May 31
This is a fully remote position, open to applicants in Germany.
• Co-owner of the architecture: Collaborate in shaping the architecture and development of our cloud infrastructure on Azure and our Kubernetes clusters — engineered for high throughput and optimal availability — to facilitate Flip’s swift global expansion.
• Drive the resilience strategy: Establish our methodology for global scaling, zero-downtime deployments, rollback procedures, and disaster recovery, ensuring the platform maintains 24/7 availability.
• Evolve our observability stack: Enhance our LGTM stack (Loki, Grafana, Tempo, Mimir) into a reliable foundation for engineers.
• Improve our IaC platform: Reduce operational toil at its origin and transform our infrastructure into a genuine self-service option for engineering teams.
• Incident leadership: Take charge during significant platform incidents, conduct blameless post-mortems, and translate insights into enduring enhancements.
• Squad mentoring: Guide team members, lead RFCs and design reviews within the squad, and assist engineers in evolving into more proficient SREs.
• Shape our roadmap: Collaborate with your squad to establish the future direction of the platform.
• 5+ years of practical experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong emphasis on infrastructure.
• Demonstrated history of constructing and managing high-throughput, highly available production systems.
• Extensive production-level experience with Kubernetes on a major hyperscaler.
• Solid experience with modern observability stacks (e.g., Prometheus, Mimir, VictoriaMetrics, dashboards, Loki, ELK) and a clear understanding of SLIs, SLOs, and error budgets.
• Strong software development capabilities in Go (preferred, as our IaC is built with Pulumi in Go) or Python.
• Practical experience with Infrastructure as Code (Pulumi, OpenTofu, Terraform) and GitOps (e.g., Argo CD) along with CI/CD pipeline design.
• Proven capacity to lead intricate infrastructure initiatives from design through to production — including authoring RFCs and steering architectural decisions within the team.
• Experience in mentoring engineers and enhancing the technical proficiency within a team.
• Confident in taking end-to-end ownership during critical incidents and the ability to translate learnings into sustainable technical advancements.
• Strong communication skills and business-fluent English.
• Willingness to participate in on-call rotations to maintain the reliability of our platform.
• Work mode: We embrace a remote-first approach, allowing you the flexibility to work from home. While we appreciate the advantages of in-person collaboration, you may occasionally attend team events, workshops, or meetings at our offices in Berlin or Stuttgart — always with prior notice. The precise balance will be openly discussed during the hiring process.
• Work-life balance: We want you away from your desk, so we cover the cost of an EGYM Wellpass membership and provide company bike leasing (JobRad).
• Celebrate success: Join a team of highly motivated and engaged colleagues in a relaxed work environment.
• Hands-on impact: Take an active role in shaping Flip and contribute to the rapid growth of a young tech company while progressing towards your own goals. Positive vibes guaranteed.
• Happy to be a Flipster: Enjoy regular team events and Culture Days that foster camaraderie among us Flipsters.
• Working abroad: At Flip, you have the opportunity to work from other European countries — let's discuss workation options during the interview.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.