Remotery

Senior Site Reliability Engineer, Infrastructure Foundations

Posted 23 hours ago

📋 Description

• Executing daily operational and DevOps duties on Wikimedia’s public-facing infrastructure, including deployment, maintenance, configuration, and troubleshooting.

• Utilizing and implementing configuration management and deployment tools such as Puppet and Kubernetes.

• Driving continuous enhancements by automating the installation, configuration, and upkeep of services on our platform.

• Collaborating closely with product teams to help deliver scalable functionalities to users by assisting in the architectural design of new services and ensuring they operate effectively at scale.

• Engaging in a 24/7 on-call rotation shared among the broader SRE team, which involves participating in incident response, diagnosing issues, and following up on system outages or alerts across Wikimedia’s production infrastructure.

• Working with a global, cross-functional team in an asynchronous communication setting.

• Guiding peers in your areas of technical expertise and operational strengths.

• Willingness and ability to travel 1-2 times a year for in-person events and team gatherings.


⛳️ Requirements

• Over 6 years of experience in an SRE, Operations, or DevOps role as part of a team.

• Proficiency with shell and various scripting languages relevant to an SRE context (Python, Go, Bash, Ruby; with a primary focus on Python) and configuration management tools (Puppet, Ansible; we use Puppet).

• Experience in designing and managing infrastructure security for a large array of diverse services.

• Involvement in technical responses during security incidents.

• Familiarity with package management on Linux systems, particularly Debian.

• Strong troubleshooting skills at the Linux system level.

• Proven track record of automating tasks and processes, identifying process gaps, and discovering automation opportunities.

• Excellent English language proficiency (both verbal and written) and capacity to work independently as an effective member of a globally distributed team across multiple time zones.

• Experience in leading and participating in incident response and post-incident review processes, aiming for root cause analysis and implementing preventive measures.


🏝️ Benefits

• Competitive salary.

• Health insurance.

• Flexible working hours.

• Opportunities for professional development.

People also viewed

Arctiq18 hours ago

Site Reliability Engineer

US flagVirginia OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job
Arctiq18 hours ago

Senior Site Reliability Engineer

US flagVirginia OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind18 hours ago

Senior DevOps Manager, German speaking

PL flagPoland OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Mediastream18 hours ago

DevOps Engineer

RO flagRomania OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Kyndryl18 hours ago

Site Reliability Engineer

US flagOhio OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$161.5k – $290.8k/year
ApplyView job
Guidehouse18 hours ago

Senior Azure DevOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$118k – $196k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers