This is a fully remote position, open to applicants in Europe.

📋 Description

• Scale infrastructure as code. Design, implement, and manage infrastructure-as-code patterns utilizing Terraform and Kubernetes that accommodate both standard connectors and custom builds. Simplify deployment and operational confidence for engineers.

• Monitoring and incident management. Develop and sustain comprehensive monitoring, logging, and alerting systems. Lead incident response initiatives, conduct post-mortems, and promote ongoing enhancements in system reliability.

• Security and compliance integration. Collaborate with our Security team to incorporate security at every layer of Build infrastructure. Ensure compliance with requirements across over 100 jurisdictions while minimizing friction for developers and customers.

• Performance and cost efficiency. Continuously enhance system performance, resource utilization, and cloud expenditure. Provide recommendations that boost both reliability and unit economics.

• Automation and operational efficiency. Identify manual operational tasks and systematically eliminate them. Develop tools and processes that enable teams to operate efficiently without increasing headcount.

• Platform reliability and developer satisfaction. Collaborate with platform teams to ensure APIs, MCP, and CLI are resilient and observable. Provide infrastructure feedback that influences the evolution of the platform.

⛳️ Requirements

• Senior-level SRE expertise: proven experience in a Site Reliability Engineering, DevOps Engineering, or SysOps role. You have established and managed production systems at scale.

• Kubernetes and AWS: extensive, hands-on experience with Kubernetes in a production environment. Strong AWS fundamentals across compute, networking, storage, and managed services.

• Infrastructure-as-code proficiency: Experience with Terraform or similar IaC tools. You code to define infrastructure; you do not rely on console clicks.

• CI/CD and deployment automation: practical experience in setting up and managing GitLab, GitHub Actions, Jenkins, or similar tools. You grasp deployment strategies, rollback mechanisms, and safety nets.

• Scripting and systems expertise: strong bash scripting skills. Comfortable with debugging system-level issues, analyzing logs, and understanding the basics of the Linux kernel.

• Excellent communication: you articulate complex infrastructure decisions clearly to both technical and non-technical stakeholders. You produce clear runbooks and documentation.

• Nice to have: Experience with one or more backend programming languages (Elixir, Python, Go, Java, Node.js, etc.).

• Nice to have: Experience in consultancy environments.

• Nice to have: Familiarity with container registry and artifact management (ECR, Docker Hub, etc.).

• Nice to have: Depth in observability stacks (Datadog, Prometheus, ELK, Grafana, or similar).

• Nice to have: Experience with or scaling multi-tenant platforms.

🏝️ Benefits

• Work from anywhere

• Flexible paid time off

• Flexible working hours (we operate asynchronously)

• 16 weeks of paid parental leave

• Mental health support services

• Stock options

• Learning budget

• Home office budget and IT equipment

• Budget for local in-person social events or co-working spaces

Senior Site Reliability Engineer – Build

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Lead DevOps Engineer, Data & AI Platform

DevOps Engineer, German

Site Reliability Engineer – Kubernetes Platform

Lead DevOps Engineer – Data & AI Platform

Security Engineer, DevSecOps

Cloud Operations Engineer

Never miss a great job!