Remotery

Senior Site Reliability Engineer

Posted Jun 20

This is a fully remote position, open to applicants in Colombia.

📋 Description

• Define and uphold SLIs/SLOs, oversee alignment and utilization of the error budget.

• Lead incident responses and conduct postmortems, implementing corrective actions.

• Automate operational tasks through tooling (e.g., auto-remediation, scaling rules).

• Develop, enhance, and sustain CI/CD pipelines, including canary deployments and blue/green strategies.

• Facilitate technical discussions with clients to ensure alignment on reliability, scalability, and performance needs.

• Propel ongoing platform enhancements throughout the service lifecycle, encompassing architecture, monitoring, and operational processes.

• Implement and expand observability systems (metrics, tracing, log aggregation).

• Optimize performance and cost by fine-tuning cloud services, autoscaling, and resource rightsizing.

• Design, deploy, and manage containerized workloads utilizing Docker and Kubernetes in production settings.

• Collaborate with development teams to integrate resilience patterns (circuit breakers, bulkheading).

• Engage in architecture discussions focused on high availability and disaster recovery.

• Mentor mid-level and junior SREs; perform reliability design reviews.


⛳️ Requirements

• 5–8 years of experience in a reliability or operations position.

• Cloud-agnostic certification: Terraform Associate, Certified Kubernetes Administrator (CKA), or SRE Foundation.

• Cloud provider certification: Professional-level certification in AWS (Solutions Architect), Azure (Solutions Architect Expert), GCP (Professional Cloud Architect), or Oracle Cloud (Architect Professional).

• Strong coding capabilities (Python, Go, or equivalent).

• Experience with Infrastructure as Code (IaC), CI/CD pipelines, and monitoring/observability stacks (Prometheus, Grafana, OpenTelemetry, ELK).

• Proficient with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK, Jaeger).

• Experience in distributed systems and production-scale services.


🏝️ Benefits

• Competitive salary and performance-based bonuses.

• Comprehensive health, dental, and vision insurance.

• Flexible work hours and remote working options.

• Professional development and continuous learning opportunities.

• Collaborative and inclusive work environment.

People also viewed

Innovative Solutions45 min ago

Cloud Engineer – DevOps

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$100k – $160k/year
ApplyView job
Caspar Health45 min ago

DevSecOps/DevOps Engineer

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
IVIX45 min ago

Deployment Engineer

US flagNew York OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Investigo11 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind11 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal11 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers