This is a fully remote position, open to applicants in Arizona, +31 more states.

📋 Description

• Executing daily operational and DevOps duties on Wikimedia’s publicly accessible infrastructure, including deployment, maintenance, configuration, and troubleshooting.

• Utilizing and implementing configuration management and deployment tools such as Puppet and Kubernetes.

• Driving continuous improvements by automating the installation, configuration, and upkeep of services on our platform.

• Collaborating closely with product teams to help deliver scalable functionality to users by contributing to the architectural design of new services and ensuring their scalability.

• Engaging in a 24/7 on-call rotation shared with the wider SRE team, which involves participating in incident response, diagnosing issues, and following up on system outages or alerts within Wikimedia’s production infrastructure.

• Working in partnership with a global, cross-functional team in an asynchronous communication environment.

• Providing mentorship to peers in your areas of technical and operational expertise.

⛳️ Requirements

• Over 6 years of experience in an SRE, Operations, or DevOps role within a team setting.

• Proficiency in shell scripting and any programming language used in an SRE context (such as Python, Go, Bash, Ruby; with a primary focus on Python) and familiarity with configuration management tools (Puppet and Ansible; we primarily use Puppet).

• Knowledge of distributed caching systems, including their underlying algorithms and performance optimization techniques.

• Experience with package management on Linux systems, specifically Debian.

• Strong troubleshooting skills at the Linux system level.

• Proven track record in automating tasks and processes, identifying process inefficiencies, and discovering opportunities for automation.

• Excellent English language skills, both verbal and written, along with the ability to work independently as an effective member of a globally distributed team across multiple time zones.

• Experience in leading and participating in incident response and post-incident review processes, focusing on conducting root cause analysis and implementing preventive measures.

🏝️ Benefits

• Competitive salary.

• Comprehensive health insurance.

• Flexible work arrangements.

• Generous paid time off.

• Opportunities for professional development.

Senior Site Reliability Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

DevOps confirmé

DevOps Engineer, Cloud

Senior Site Reliability Engineer

Analista de Infraestrutura, SRE, DevOps

Senior Site Reliability Engineer

Staff Database Reliability Engineer, DBRE

Never miss a great job!