Remotery

Senior Site Reliability Engineer – Build

Posted 2 days ago

This is a fully remote position, open to applicants in Europe.

📋 Description

• Scale infrastructure as code. Design, implement, and manage infrastructure-as-code patterns utilizing Terraform and Kubernetes that accommodate both standard connectors and custom builds. Simplify deployment and operational confidence for engineers.

• Monitoring and incident management. Develop and sustain comprehensive monitoring, logging, and alerting systems. Lead incident response initiatives, conduct post-mortems, and promote ongoing enhancements in system reliability.

• Security and compliance integration. Collaborate with our Security team to incorporate security at every layer of Build infrastructure. Ensure compliance with requirements across over 100 jurisdictions while minimizing friction for developers and customers.

• Performance and cost efficiency. Continuously enhance system performance, resource utilization, and cloud expenditure. Provide recommendations that boost both reliability and unit economics.

• Automation and operational efficiency. Identify manual operational tasks and systematically eliminate them. Develop tools and processes that enable teams to operate efficiently without increasing headcount.

• Platform reliability and developer satisfaction. Collaborate with platform teams to ensure APIs, MCP, and CLI are resilient and observable. Provide infrastructure feedback that influences the evolution of the platform.


⛳️ Requirements

• Senior-level SRE expertise: proven experience in a Site Reliability Engineering, DevOps Engineering, or SysOps role. You have established and managed production systems at scale.

• Kubernetes and AWS: extensive, hands-on experience with Kubernetes in a production environment. Strong AWS fundamentals across compute, networking, storage, and managed services.

• Infrastructure-as-code proficiency: Experience with Terraform or similar IaC tools. You code to define infrastructure; you do not rely on console clicks.

• CI/CD and deployment automation: practical experience in setting up and managing GitLab, GitHub Actions, Jenkins, or similar tools. You grasp deployment strategies, rollback mechanisms, and safety nets.

• Scripting and systems expertise: strong bash scripting skills. Comfortable with debugging system-level issues, analyzing logs, and understanding the basics of the Linux kernel.

• Excellent communication: you articulate complex infrastructure decisions clearly to both technical and non-technical stakeholders. You produce clear runbooks and documentation.

• Nice to have: Experience with one or more backend programming languages (Elixir, Python, Go, Java, Node.js, etc.).

• Nice to have: Experience in consultancy environments.

• Nice to have: Familiarity with container registry and artifact management (ECR, Docker Hub, etc.).

• Nice to have: Depth in observability stacks (Datadog, Prometheus, ELK, Grafana, or similar).

• Nice to have: Experience with or scaling multi-tenant platforms.


🏝️ Benefits

• Work from anywhere

• Flexible paid time off

• Flexible working hours (we operate asynchronously)

• 16 weeks of paid parental leave

• Mental health support services

• Stock options

• Learning budget

• Home office budget and IT equipment

• Budget for local in-person social events or co-working spaces

People also viewed

Work Life Group8 min ago

Lead DevOps Engineer, Data & AI Platform

HU flagHungary OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
accesa.eu8 min ago

DevOps Engineer, German

RO flagRomania OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cisco14 min ago

Site Reliability Engineer – Kubernetes Platform

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Work Life Group21 min ago

Lead DevOps Engineer – Data & AI Platform

CZ flagCzechia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
JumpCloud21 min ago

Security Engineer, DevSecOps

MX flagMexico OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Unit421 min ago

Cloud Operations Engineer

PT flagPortugal OnlyFull-timeDevOps & Site Reliability Engineer (SRE)€30.5k – €35.1k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers