Remotery

SRE Specialist – Platform Engineering

Posted 6 days ago

This is a fully remote position, open to applicants in Brazil.

📋 Description

• Develop and sustain the enterprise Kubernetes platform (AKS/EKS), guaranteeing scalability, security, and high availability of the environments;

• Construct and improve infrastructure and operations automation through Infrastructure as Code and GitOps methodologies;

• Create and maintain CI/CD pipelines, assisting teams in their continuous delivery journey;

• Execute observability, monitoring, and distributed tracing solutions to ensure the visibility and reliability of services;

• Address critical incidents, conduct root cause analysis, and implement ongoing enhancements to the platform;

• Aid development teams in embracing cloud, Kubernetes, observability, and automation best practices;

• Advance the internal engineering platform to enhance developer experience and expedite the delivery of business value;

• Apply and refine autoscaling strategies, capacity management, and operational efficiency for cloud environments;

• Collaborate with cross-functional teams to establish architecture, security, and governance standards for Azure and AWS environments;

• Assess, test, and put into action new solutions and technologies focused on Platform Engineering, SRE, and enterprise automation.


⛳️ Requirements

• Bachelor's degree;

• Extensive experience in administering and evolving Kubernetes environments, particularly on managed platforms such as AKS (Azure Kubernetes Service) and/or EKS (Amazon Elastic Kubernetes Service);

• Proficient in Public Cloud environments, engaging with Azure and/or AWS, encompassing infrastructure, networking, and security services;

• Proven experience in implementing and maintaining CI/CD pipelines and GitOps practices, utilizing tools like GitHub Actions or similar;

• Advanced understanding of Infrastructure as Code (IaC), employing Terraform, Crossplane, or equivalent tools for provisioning and infrastructure governance;

• Familiarity with modern observability, monitoring, logging, and distributed tracing solutions, utilizing tools such as Grafana, Prometheus, OpenTelemetry, Loki, Tempo, or similar;

• Strong expertise in Linux, containers, and Docker, including troubleshooting and optimizing containerized environments;

• Experience in automation and scripting using Bash, PowerShell, Python, or other equivalent languages;

• Knowledgeable in networking, DNS, load balancers, connectivity, and security within cloud environments;

• Capable of analyzing and resolving issues in distributed, mission-critical environments;

• Experienced in constructing, operating, and evolving corporate platforms with a focus on reliability, scalability, automation, and enhancing developer experience.

• **Preferred Qualifications:**

• Familiarity with Argo CD and the GitOps ecosystem;

• Understanding of Argo Workflows and Argo Events for orchestration and process automation;

• Experience with Karpenter, Cluster Autoscaler, or other advanced Kubernetes autoscaling solutions;

• Knowledge of Service Mesh technologies such as Istio, Linkerd, or similar;

• Understanding of FinOps, capacity management, and cost optimization in cloud environments;

• Involvement in AIOps initiatives, intelligent automation, and applying AI to platform operation;

• Familiarity with Terragrunt, Crossplane, and advanced infrastructure management tools;

• Experience with distributed observability, tracing, and performance analysis for large-scale applications;

• Background in hybrid architectures and multi-cloud environments.


🏝️ Benefits

• Competitive salary and performance-based bonuses;

• Comprehensive health, dental, and vision insurance;

• Flexible working hours and remote working options;

• Opportunities for professional development and continuous learning;

• Collaborative and inclusive company culture;

• Paid time off and holidays.

People also viewed

Advanced Solutions International, Inc.12 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone12 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers