Remotery

Senior SRE Engineer

Posted May 21

This is a fully remote position, open to applicants in Spain.

📋 Description

• Oversee and maintain container orchestration platforms and workloads that are containerized.

• Monitor and resolve issues in production systems, participating in on-call rotations to ensure operational reliability.

• Propel enhancements in observability by upgrading monitoring, logging, and alerting functions across systems and data platforms.

• Administer and optimize cloud environments across various providers.

• Manage and support distributed data platforms along with real-time processing systems.

• Design and maintain continuous integration and delivery pipelines to facilitate efficient and dependable deployments.

• Lead and apply Infrastructure as Code (IaC) methodologies to ensure consistency and scalability.

• Automate and orchestrate infrastructure utilizing programming and scripting languages.

• Execute system administration and networking responsibilities to support both internal and external environments.

• Collaborate effectively with engineers and stakeholders across different time zones.


⛳️ Requirements

• A minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles.

• Demonstrated success in managing large-scale production systems within cloud environments (AWS, GCP, Azure, or OCI).

• Proven leadership in driving incident response, implementing on-call best practices, and fostering a reliability-focused culture.

• Strong background in production on-call operations and incident management.

• Advanced skills in Kubernetes administration and troubleshooting.

• Practical experience with observability tools such as Prometheus, Grafana, Loki, and Alertmanager.

• Familiarity with chat-based operational interfaces and/or auto-remediation controllers using AI agent frameworks.

• Understanding of AI agents for auto-triaging alerts, correlating signals, and suggesting root-cause hypotheses.

• Expertise in operating data platforms like Elasticsearch, MongoDB, Spark, Kafka, and Redis.

• Proficiency in public cloud services (AWS, Azure, GCP, or OCI).

• Strong programming and automation capabilities in Python and Bash.

• In-depth knowledge of Infrastructure as Code tools (Terraform, Helm).

• Experience with CI/CD pipelines (GitHub Actions, Bitbucket, ArgoCD).

• Solid technical foundation in distributed systems, databases, networking, and Linux administration.

• Exceptional problem-solving, communication, and leadership skills.

• A Bachelor's degree in Computer Science, Engineering, or a related technical field.

• Relevant certifications in AWS, GCP, Observability, Linux, or Kubernetes are advantageous.


🏝️ Benefits

• Competitive salary and performance-based bonuses.

• Comprehensive health, dental, and vision insurance.

• Generous paid time off and flexible working hours.

• Opportunities for professional development and training.

• Collaborative and inclusive work environment.

People also viewed

Advanced Solutions International, Inc.10 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone10 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers