This is a fully remote position, open to applicants in Brazil.

📋 Description

• Design, construct, and operate cloud infrastructure, CI/CD pipelines, and developer platforms that support digital innovation initiatives.

• Build and maintain infrastructure for the lifecycle management of AI/ML models: training environments, model serving, and production monitoring.

• Ensure that deploying an AI model into production is as reliable, repeatable, and observable as a traditional software service deployment.

• Implement deployment strategies: blue/green, canary, phased updates, and feature flags — for traditional services and AI model endpoints.

• Build and maintain a comprehensive observability stack: metrics, logs, traces, and AI-specific monitoring.

• Design and implement security policies as code and identity management.

⛳️ Requirements

• Over 6 years of experience in DevOps, SRE, or platform engineering.

• Experience with infrastructure as code: Terraform (primary), with exposure to Pulumi, CloudFormation, or Bicep.

• Proficiency in Kubernetes (EKS, AKS, or GKE): cluster management, Helm charts, operators, auto-scaling, and troubleshooting.

• In-depth experience with CI/CD pipeline design: GitHub Actions, GitLab CI, Azure DevOps Pipelines, or Jenkins — including multi-stage pipelines with automated quality gates.

• Strong cloud infrastructure experience in at least two platforms: AWS, Azure, GCP — with practical skills in networking, computing, storage, identity, and security services.

• Proficient in scripting and automation: Python, Bash, PowerShell, and at least one of: Go, TypeScript.

• Experience building observability stacks: Prometheus, Grafana, Datadog, ELK, OpenTelemetry, and alerting/incident management systems (PagerDuty, Opsgenie).

• Solid understanding of security engineering: secret management, network security, IAM, container security, and compliance automation.

• Experience with GitOps practices and tools: ArgoCD, Flux, or equivalent.

• Fluent in English, both written and spoken.

• Proven experience in international projects, including collaboration with global and multicultural teams.

• Strong communication skills, stakeholder management, and problem-solving abilities.

• Previous experience mentoring engineers or serving as a technical lead is highly preferred.

• Hands-on experience in MLOps: model serving, GPU infrastructure management, and knowledge of chaos engineering tools such as Chaos Monkey.

• A Bachelor's degree in Computer Science, Information Systems, Engineering, or a related field is preferred.

🏝️ Benefits

• 100% Remote

DevOps/Platform Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

DevOps Reliability Engineer

Senior Site Reliability Engineer – Network

Staff Site Reliability Engineer

DevOps Engineer, Mid Level

DevOps Engineer, Azure

DevOps Engineer, mk8s

Never miss a great job!