
Senior Platform Engineer – SRE
Posted May 24

Posted May 24
This is a fully remote position, open to applicants in Brazil.
• Lead the design of Infrastructure as Code (IaC) architecture, including Terraform module design, state management, multi-account patterns, and establishing standards for the team to follow.
• Spearhead GitOps implementation at scale through ArgoCD configuration, progressive delivery patterns, promotion workflows, and ensuring deployment reliability across various environments and tenants.
• Architect and manage multi-tenant Kubernetes infrastructure on AWS EKS, focusing on tenant isolation, workload placement, cluster topology, and long-term scalability strategies.
• Develop self-service infrastructure automation by creating provisioning pipelines, managing configurations, and building platform capabilities that engineering teams can utilize without manual assistance.
• Drive the use of agentic coding tools for infrastructure tasks, including scaffolding new environments, generating and reviewing IaC, enhancing automation, and setting patterns for the team.
• Take ownership of reliability aspects, including defining Service Level Objectives (SLOs), managing error budgets, ensuring quality incident responses, and fostering a feedback loop that transforms incidents into platform enhancements.
• Establish observability standards, focusing on trace coverage, alert quality, on-call ergonomics, and a culture of runbooks.
• Collaborate with the security team on zero-trust architecture, scaling secrets management, and hardening infrastructure.
• Contribute to the technical roadmap and assist the team in prioritizing essential tasks.
• Mentor mid-level engineers through code reviews, design feedback, and on-call shadowing.
• 6+ years of experience in platform engineering, Site Reliability Engineering (SRE), or infrastructure, with significant time spent operating production systems at scale.
• Extensive expertise in IaC — proficient in designing Terraform architectures rather than merely writing modules; experienced in managing complex state and multi-account configurations in production.
• Strong background in GitOps — possess a deep understanding of declarative infrastructure management and have well-formed opinions on best practices.
• In-depth knowledge of Kubernetes — experience in operating production clusters, managing real failure scenarios, and understanding the control plane intricacies.
• Solid AWS expertise, covering networking, compute, IAM, storage, and multi-account design.
• Familiarity with multi-tenant infrastructure, including isolation patterns, noisy neighbor mitigation, and tenant lifecycle management.
• Senior-level automation-first mindset — capable of designing systems that eliminate entire categories of manual work rather than just individual tasks.
• Active engagement with agentic coding tools — adept at directing their use, critically reviewing their outputs, and leveraging them to enhance productivity.
• Proven track record in reliability engineering, with defined and measured SLOs, conducted post-mortems, and driven measurable improvements.
• Excellent communication skills — able to make architectural decisions understandable to both engineers and leadership.
• Opportunities for professional development.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.