Remotery

Senior Site Reliability Engineer

Posted May 24

This is a fully remote position, open to applicants in Italy.

📋 Description

• Engage in hands-on Reliability & System Engineering by designing, constructing, and managing reliable and scalable systems, defining and monitoring SLOs/SLIs, directly working on production infrastructure, and collaborating closely with software engineers to enhance system design and reliability.

• Focus on Automation, Operations & Incident Response by developing automation for infrastructure and operational workflows to minimize toil and reduce MTTR. Participate in and lead incident responses, as well as conduct blameless post-incident reviews with clear follow-ups implemented in code and tooling.

• Analyze and optimize system performance and cost under the Performance, Capacity & Security domain, providing data, insights, and recommendations for capacity planning, while supporting security best practices through direct involvement in vulnerability remediation and threat mitigation.


⛳️ Requirements

• Possess hands-on experience with SRE practices in production environments, showcasing strong expertise in AWS, Kubernetes, networking, DNS, and Infrastructure as Code (with a preference for Pulumi and knowledge of Terraform being a plus).

• Exhibit a strong foundation in Automation & Software Engineering, emphasizing code quality and maintainability, including proficiency in Python and in-depth knowledge of the Python ecosystem (testing, debugging, packaging), along with a consistent focus on crafting clean, well-structured, and maintainable code.

• Demonstrate skills in Reliability, Data & Operations by engaging stakeholders, mentoring others, leading incident responses and root cause analyses (RCAs), enhancing system reliability, and proposing solutions while sharing insights.

• Nice-to-Have: Experience in highly regulated industries (such as Insurance, Banking, Healthcare), managing sensitive data, and supporting secure networking configurations, with familiarity in security technologies like Cloudflare.

• Have a solid understanding of microservices architectures, including their principles and trade-offs.

• Gain hands-on experience with Datadog for platform and application monitoring, performance optimization, and a strong foundation in database structures.


🏝️ Benefits

• Work Your Way: Enjoy full flexibility – work from home, the office, or a combination of both. Additionally, work from anywhere for up to 30 days each year.

• Grow with us: Access learning resources, mentorship, and a personalized growth plan tailored to your development.

• Thrive and perform: Benefit from private healthcare, gym discounts, wellbeing programs, and mental health support.

People also viewed

Work Life Group10 min ago

Lead DevOps Engineer, Data & AI Platform

HU flagHungary OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
accesa.eu10 min ago

DevOps Engineer, German

RO flagRomania OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cisco16 min ago

Site Reliability Engineer – Kubernetes Platform

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Work Life Group23 min ago

Lead DevOps Engineer – Data & AI Platform

CZ flagCzechia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
JumpCloud23 min ago

Security Engineer, DevSecOps

MX flagMexico OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Unit423 min ago

Cloud Operations Engineer

PT flagPortugal OnlyFull-timeDevOps & Site Reliability Engineer (SRE)€30.5k – €35.1k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers