This is a fully remote position, open to applicants in Brazil.

📋 Description

• Lead the design of Infrastructure as Code (IaC) architecture, including Terraform module design, state management, multi-account patterns, and establishing standards for the team to follow.

• Spearhead GitOps implementation at scale through ArgoCD configuration, progressive delivery patterns, promotion workflows, and ensuring deployment reliability across various environments and tenants.

• Architect and manage multi-tenant Kubernetes infrastructure on AWS EKS, focusing on tenant isolation, workload placement, cluster topology, and long-term scalability strategies.

• Develop self-service infrastructure automation by creating provisioning pipelines, managing configurations, and building platform capabilities that engineering teams can utilize without manual assistance.

• Drive the use of agentic coding tools for infrastructure tasks, including scaffolding new environments, generating and reviewing IaC, enhancing automation, and setting patterns for the team.

• Take ownership of reliability aspects, including defining Service Level Objectives (SLOs), managing error budgets, ensuring quality incident responses, and fostering a feedback loop that transforms incidents into platform enhancements.

• Establish observability standards, focusing on trace coverage, alert quality, on-call ergonomics, and a culture of runbooks.

• Collaborate with the security team on zero-trust architecture, scaling secrets management, and hardening infrastructure.

• Contribute to the technical roadmap and assist the team in prioritizing essential tasks.

• Mentor mid-level engineers through code reviews, design feedback, and on-call shadowing.

⛳️ Requirements

• 6+ years of experience in platform engineering, Site Reliability Engineering (SRE), or infrastructure, with significant time spent operating production systems at scale.

• Extensive expertise in IaC — proficient in designing Terraform architectures rather than merely writing modules; experienced in managing complex state and multi-account configurations in production.

• Strong background in GitOps — possess a deep understanding of declarative infrastructure management and have well-formed opinions on best practices.

• In-depth knowledge of Kubernetes — experience in operating production clusters, managing real failure scenarios, and understanding the control plane intricacies.

• Solid AWS expertise, covering networking, compute, IAM, storage, and multi-account design.

• Familiarity with multi-tenant infrastructure, including isolation patterns, noisy neighbor mitigation, and tenant lifecycle management.

• Senior-level automation-first mindset — capable of designing systems that eliminate entire categories of manual work rather than just individual tasks.

• Active engagement with agentic coding tools — adept at directing their use, critically reviewing their outputs, and leveraging them to enhance productivity.

• Proven track record in reliability engineering, with defined and measured SLOs, conducted post-mortems, and driven measurable improvements.

• Excellent communication skills — able to make architectural decisions understandable to both engineers and leadership.

🏝️ Benefits

• Opportunities for professional development.

Senior Platform Engineer – SRE

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

DevOps Reliability Engineer

Senior Site Reliability Engineer – Network

Staff Site Reliability Engineer

DevOps Engineer, Mid Level

DevOps Engineer, Azure

DevOps Engineer, mk8s

Never miss a great job!