
SRE Specialist – Platform Engineering
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in Brazil.
• Develop and sustain the enterprise Kubernetes platform (AKS/EKS), guaranteeing scalability, security, and high availability of the environments;
• Construct and improve infrastructure and operations automation through Infrastructure as Code and GitOps methodologies;
• Create and maintain CI/CD pipelines, assisting teams in their continuous delivery journey;
• Execute observability, monitoring, and distributed tracing solutions to ensure the visibility and reliability of services;
• Address critical incidents, conduct root cause analysis, and implement ongoing enhancements to the platform;
• Aid development teams in embracing cloud, Kubernetes, observability, and automation best practices;
• Advance the internal engineering platform to enhance developer experience and expedite the delivery of business value;
• Apply and refine autoscaling strategies, capacity management, and operational efficiency for cloud environments;
• Collaborate with cross-functional teams to establish architecture, security, and governance standards for Azure and AWS environments;
• Assess, test, and put into action new solutions and technologies focused on Platform Engineering, SRE, and enterprise automation.
• Bachelor's degree;
• Extensive experience in administering and evolving Kubernetes environments, particularly on managed platforms such as AKS (Azure Kubernetes Service) and/or EKS (Amazon Elastic Kubernetes Service);
• Proficient in Public Cloud environments, engaging with Azure and/or AWS, encompassing infrastructure, networking, and security services;
• Proven experience in implementing and maintaining CI/CD pipelines and GitOps practices, utilizing tools like GitHub Actions or similar;
• Advanced understanding of Infrastructure as Code (IaC), employing Terraform, Crossplane, or equivalent tools for provisioning and infrastructure governance;
• Familiarity with modern observability, monitoring, logging, and distributed tracing solutions, utilizing tools such as Grafana, Prometheus, OpenTelemetry, Loki, Tempo, or similar;
• Strong expertise in Linux, containers, and Docker, including troubleshooting and optimizing containerized environments;
• Experience in automation and scripting using Bash, PowerShell, Python, or other equivalent languages;
• Knowledgeable in networking, DNS, load balancers, connectivity, and security within cloud environments;
• Capable of analyzing and resolving issues in distributed, mission-critical environments;
• Experienced in constructing, operating, and evolving corporate platforms with a focus on reliability, scalability, automation, and enhancing developer experience.
• **Preferred Qualifications:**
• Familiarity with Argo CD and the GitOps ecosystem;
• Understanding of Argo Workflows and Argo Events for orchestration and process automation;
• Experience with Karpenter, Cluster Autoscaler, or other advanced Kubernetes autoscaling solutions;
• Knowledge of Service Mesh technologies such as Istio, Linkerd, or similar;
• Understanding of FinOps, capacity management, and cost optimization in cloud environments;
• Involvement in AIOps initiatives, intelligent automation, and applying AI to platform operation;
• Familiarity with Terragrunt, Crossplane, and advanced infrastructure management tools;
• Experience with distributed observability, tracing, and performance analysis for large-scale applications;
• Background in hybrid architectures and multi-cloud environments.
• Competitive salary and performance-based bonuses;
• Comprehensive health, dental, and vision insurance;
• Flexible working hours and remote working options;
• Opportunities for professional development and continuous learning;
• Collaborative and inclusive company culture;
• Paid time off and holidays.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.