Remotery

SRE Partner

Posted May 19

This is a fully remote position, open to applicants in Brazil.

📋 Description

• Ensure the reliability, availability, and scalability of the systems and services within the assigned Product Areas (PAs).

• Develop and implement monitoring, observability, and alerting solutions that are integrated with the Agentic Engineering Platform.

• Assist teams in defining and tracking Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.

• Design and enhance on-call management processes across Product Areas, including rotations, escalation procedures, alerting tools, and incident management.

• Collaborate closely with the Engineering Platform to ensure platform capabilities are effectively utilized and embraced by product teams.

• Actively participate in the advancement of the Agentic Engineering Platform by providing genuine feedback from Product Areas regarding challenges, gaps, and opportunities for improvement.

• Engage in and shape the development of a reliability-focused engineering culture (SRE) throughout the organization.

• Assist with the migration of critical systems, environment segregation, and the phase-out of outdated technologies.


⛳️ Requirements

• Experience with cloud environments, ideally Google Cloud Platform (GCP).

• Proficiency in observability tools and practices such as Prometheus, Grafana, Loki, Thanos, Elasticsearch, and Alertmanager.

• Strong understanding of Kubernetes and distributed systems architecture.

• Solid knowledge of Infrastructure as Code (IaC) and Terraform.

• Practical experience with incident management, on-call processes, and conducting post-mortems.

• Experience in defining and tracking SLOs and error budgets.

• Ability to analyze logs and assess the performance of distributed systems.

• Excellent communication and influencing skills: capable of advocating for technical solutions to various audiences, including engineers, Product Managers, and leadership.

• Data-driven approach, utilizing metrics to assess risks, prioritize actions, and illustrate impact.


🏝️ Benefits

• Health insurance

• 401(k) matching

• Paid time off

• Flexible work hours

People also viewed

Advanced Solutions International, Inc.10 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone10 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers