Remotery

Senior Site Reliability Engineer

Posted Jun 1

This is a fully remote position, open to applicants in Arizona, +31 more states.

📋 Description

• Executing daily operational and DevOps duties on Wikimedia’s publicly accessible infrastructure, including deployment, maintenance, configuration, and troubleshooting.

• Utilizing and implementing configuration management and deployment tools such as Puppet and Kubernetes.

• Driving continuous improvements by automating the installation, configuration, and upkeep of services on our platform.

• Collaborating closely with product teams to help deliver scalable functionality to users by contributing to the architectural design of new services and ensuring their scalability.

• Engaging in a 24/7 on-call rotation shared with the wider SRE team, which involves participating in incident response, diagnosing issues, and following up on system outages or alerts within Wikimedia’s production infrastructure.

• Working in partnership with a global, cross-functional team in an asynchronous communication environment.

• Providing mentorship to peers in your areas of technical and operational expertise.


⛳️ Requirements

• Over 6 years of experience in an SRE, Operations, or DevOps role within a team setting.

• Proficiency in shell scripting and any programming language used in an SRE context (such as Python, Go, Bash, Ruby; with a primary focus on Python) and familiarity with configuration management tools (Puppet and Ansible; we primarily use Puppet).

• Knowledge of distributed caching systems, including their underlying algorithms and performance optimization techniques.

• Experience with package management on Linux systems, specifically Debian.

• Strong troubleshooting skills at the Linux system level.

• Proven track record in automating tasks and processes, identifying process inefficiencies, and discovering opportunities for automation.

• Excellent English language skills, both verbal and written, along with the ability to work independently as an effective member of a globally distributed team across multiple time zones.

• Experience in leading and participating in incident response and post-incident review processes, focusing on conducting root cause analysis and implementing preventive measures.


🏝️ Benefits

• Competitive salary.

• Comprehensive health insurance.

• Flexible work arrangements.

• Generous paid time off.

• Opportunities for professional development.

People also viewed

N2JSoft, administrative and HR softwares11 hours ago

DevOps confirmé

FR flagFrance OnlyFull-timeDevOps & Site Reliability Engineer (SRE)€60k/year
ApplyView job
It's Prodigy12 hours ago

DevOps Engineer, Cloud

Anywhere in the WorldFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
ARA1 day ago

Senior Site Reliability Engineer

US flagNew Mexico OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Kenlo2 days ago

Analista de Infraestrutura, SRE, DevOps

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Ad Hoc LLC2 days ago

Senior Site Reliability Engineer

North AmericaFull-timeDevOps & Site Reliability Engineer (SRE)$135k – $150k/year
ApplyView job
Assured2 days ago

Staff Database Reliability Engineer, DBRE

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$165k – $185k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers