Remotery

SRE Partner – Affirmative Action Position for Persons with Disabilities

Posted May 23

This is a fully remote position, open to applicants in Brazil.

📋 Description

• Ensure the reliability, availability, and scalability of systems and services within the assigned Product Areas (PAs).

• Design and implement solutions for monitoring, observability, and alerting that integrate with the Agentic Engineering Platform.

• Assist teams in defining and tracking Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.

• Organize and enhance on-call management within the PAs, including rotation, escalation procedures, alerting tools, and incident management.

• Collaborate closely with the Engineering Platform to ensure that platform capabilities are effectively utilized and adopted by product teams.

• Actively engage in the advancement of the Agentic Engineering Platform by providing genuine feedback from PAs regarding friction points, gaps, and opportunities for enhancement.

• Participate in and help foster a reliability-oriented Site Reliability Engineering (SRE) culture throughout the organization.

• Assist in the migration of critical systems, the segregation of environments, and the deprecation of legacy technologies.


⛳️ Requirements

• Experience working with cloud environments, ideally Google Cloud Platform (GCP).

• Expertise in observability tools and methodologies (Prometheus, Grafana, Loki, Thanos, Elasticsearch, AlertManager, etc.).

• Strong understanding of Kubernetes and distributed system architectures.

• Solid knowledge of Infrastructure as Code (IaC) principles and Terraform.

• Practical experience in incident management, on-call duties, and conducting post-mortems.

• Experience in defining and tracking SLOs and error budgets.

• Ability to analyze logs and assess the performance of distributed systems.

• Excellent communication and influencing skills, capable of advocating technical solutions to a variety of audiences, including engineers, product managers, and leadership.

• A data-driven approach, utilizing data to identify risks, prioritize actions, and illustrate impact.


🏝️ Benefits

• N/A

People also viewed

Advanced Solutions International, Inc.12 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone12 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers