Remotery

Senior Site Reliability Engineer

Posted May 19

This is a fully remote position, open to applicants in Mexico.

📋 Description

• Accountable for designing, constructing, maintaining, and scaling production services and server farms across various data centers for intricate and data-heavy cloud services.

• Develop and refine software architecture to enhance scalability, service reliability, capacity, and performance.

• Create automation scripts for provisioning and managing infrastructure on a large scale. You are not just an operator but a seasoned software engineer focused on operational excellence.

• Collaborate with development teams to ensure applications integrate seamlessly within the infrastructure, with scalability and reliability designed and implemented from the outset. Engage with QA to build pipelines and automation for application delivery and deployment to production.

• Actively troubleshoot incidents, develop theories, test hypotheses, and narrow down possibilities to identify root causes.

• Produce postmortem reviews and offer remediation recommendations.

• Detect adverse trends before they escalate into issues; respond to automated system alerts, efficiently troubleshoot system errors, and manage incidents to restore systems to normal operating conditions.

• Author and maintain comprehensive documentation of all relevant specifications, systems, and procedures.

• Support and adhere to the company’s Quality Management System policies and procedures.


⛳️ Requirements

• Bachelor’s degree (or equivalent) in computer science or a related field.

• Familiarity with IaC technologies such as Terraform, Ansible, Puppet, and Chef.

• Expertise in cluster creation and management through Kubernetes.

• Proficiency in Microsoft Azure, AWS, Google Cloud, Azure services, Virtual Machines in Azure, and Virtual Network Configuration.

• Understanding of design patterns such as IaaS, PaaS, and SaaS.

• Knowledge of CI/CD methodologies.

• Scripting skills with PowerShell.

• Understanding of IPs and subnet masks.

• Ability to program (both structured and OOP) using one or more high-level languages, including Python, Java, C/C++, Ruby, and JavaScript.

• Experience with distributed storage technologies like NFS, HDFS, Ceph, and Amazon S3, along with dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn).

• A proactive approach to identifying issues, performance bottlenecks, and opportunities for improvement.


🏝️ Benefits

• Competitive base salary and a permanent contract directly with the company.

• Ongoing training plan with paid certifications.

• Career development plan tailored to your growth and expertise.

• Additional benefits beyond the legal requirements: 12 days of Paid Time Off, a 30-day Christmas Bonus, Medical Insurance, Life Insurance, Savings Fund, and Groceries Bonus.

• Quarterly Performance Bonus.

• Provision of computer equipment for your work.

• Option for 100% remote work.

People also viewed

Work Life Group33 min ago

Lead DevOps Engineer, Data & AI Platform

HU flagHungary OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
accesa.eu33 min ago

DevOps Engineer, German

RO flagRomania OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cisco39 min ago

Site Reliability Engineer – Kubernetes Platform

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Work Life Group46 min ago

Lead DevOps Engineer – Data & AI Platform

CZ flagCzechia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
JumpCloud46 min ago

Security Engineer, DevSecOps

MX flagMexico OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Unit446 min ago

Cloud Operations Engineer

PT flagPortugal OnlyFull-timeDevOps & Site Reliability Engineer (SRE)€30.5k – €35.1k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers