Remotery

Senior Site Reliability Engineer

Posted Jun 12

This is a fully remote position, open to applicants in Ireland.

📋 Description

• Design, develop, and implement production systems that prioritize scalability, reliability, observability, and performance while adhering to strict security protocols.

• Create and sustain extensive automation solutions aimed at reducing repetitive tasks and enhancing operational efficiency in production environments.

• Actively monitor production systems, establish smart alerting strategies, and deploy automated incident response mechanisms to minimize downtime.

• Generate and update detailed incident response documentation; perform in-depth post-incident analyses to uncover root causes and prevent future occurrences.

• Work alongside software engineering teams to pinpoint and address infrastructural bottlenecks, crafting innovative solutions that improve product deployment processes.

• Oversee and enhance monitoring infrastructure with industry-standard tools to ensure thorough visibility across all systems.

• Strategically plan, communicate, and carry out maintenance windows on production systems with minimal impact on service availability.

• Assess platform and infrastructural challenges with decisiveness and analytical rigor; liaise with third-party vendors and support teams as necessary.

• Implement new systems and updates in a staged, risk-managed approach, ensuring safe and incremental rollouts.

• Research and adopt best practices in infrastructure and platform management to uphold secure, scalable, and fault-tolerant systems.

• Examine the design and implementation details of open-source systems to improve troubleshooting capabilities and expedite issue resolution.

• Collaborate transparently with stakeholders to convey system status, planned maintenance, and infrastructure enhancements.


⛳️ Requirements

• Bachelor's degree in Computer Science, Engineering, or equivalent professional experience (5+ years in a related infrastructure or systems role).

• Proficient in one or more programming languages: Go, Python, or bash shell scripting, with the capability to implement medium-complexity automation workflows.

• Strong understanding of Linux or UNIX from both administrative and debugging perspectives.

• Practical experience in operating software systems, infrastructure, and complex applications at scale in production environments.

• Proven expertise in infrastructure-as-code principles and practices.

• Strong problem-solving and software troubleshooting abilities, with a methodical and analytical approach.

• Experience in server provisioning, particularly related to storage and networking.

• Demonstrated capability to collaborate within cross-functional teams and convey technical concepts effectively.

• Familiarity with incident response, postmortem analysis, and continuous improvement methodologies.


🏝️ Benefits

• Remote work options.

People also viewed

Advanced Solutions International, Inc.10 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone10 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers