This is a fully remote position, open to applicants in Mexico.

📋 Description

• Accountable for designing, constructing, maintaining, and scaling production services and server farms across various data centers for intricate and data-heavy cloud services.

• Develop and refine software architecture to enhance scalability, service reliability, capacity, and performance.

• Create automation scripts for provisioning and managing infrastructure on a large scale. You are not just an operator but a seasoned software engineer focused on operational excellence.

• Collaborate with development teams to ensure applications integrate seamlessly within the infrastructure, with scalability and reliability designed and implemented from the outset. Engage with QA to build pipelines and automation for application delivery and deployment to production.

• Actively troubleshoot incidents, develop theories, test hypotheses, and narrow down possibilities to identify root causes.

• Produce postmortem reviews and offer remediation recommendations.

• Detect adverse trends before they escalate into issues; respond to automated system alerts, efficiently troubleshoot system errors, and manage incidents to restore systems to normal operating conditions.

• Author and maintain comprehensive documentation of all relevant specifications, systems, and procedures.

• Support and adhere to the company’s Quality Management System policies and procedures.

⛳️ Requirements

• Bachelor’s degree (or equivalent) in computer science or a related field.

• Familiarity with IaC technologies such as Terraform, Ansible, Puppet, and Chef.

• Expertise in cluster creation and management through Kubernetes.

• Proficiency in Microsoft Azure, AWS, Google Cloud, Azure services, Virtual Machines in Azure, and Virtual Network Configuration.

• Understanding of design patterns such as IaaS, PaaS, and SaaS.

• Knowledge of CI/CD methodologies.

• Scripting skills with PowerShell.

• Understanding of IPs and subnet masks.

• Ability to program (both structured and OOP) using one or more high-level languages, including Python, Java, C/C++, Ruby, and JavaScript.

• Experience with distributed storage technologies like NFS, HDFS, Ceph, and Amazon S3, along with dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn).

• A proactive approach to identifying issues, performance bottlenecks, and opportunities for improvement.

🏝️ Benefits

• Competitive base salary and a permanent contract directly with the company.

• Ongoing training plan with paid certifications.

• Career development plan tailored to your growth and expertise.

• Additional benefits beyond the legal requirements: 12 days of Paid Time Off, a 30-day Christmas Bonus, Medical Insurance, Life Insurance, Savings Fund, and Groceries Bonus.

• Quarterly Performance Bonus.

• Provision of computer equipment for your work.

• Option for 100% remote work.

Senior Site Reliability Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Lead DevOps Engineer, Data & AI Platform

DevOps Engineer, German

Site Reliability Engineer – Kubernetes Platform

Lead DevOps Engineer – Data & AI Platform

Security Engineer, DevSecOps

Cloud Operations Engineer

Never miss a great job!