
Senior Site Reliability Engineer
Posted Jun 12

Posted Jun 12
This is a fully remote position, open to applicants in Ireland.
• Design, develop, and implement production systems that prioritize scalability, reliability, observability, and performance while adhering to strict security protocols.
• Create and sustain extensive automation solutions aimed at reducing repetitive tasks and enhancing operational efficiency in production environments.
• Actively monitor production systems, establish smart alerting strategies, and deploy automated incident response mechanisms to minimize downtime.
• Generate and update detailed incident response documentation; perform in-depth post-incident analyses to uncover root causes and prevent future occurrences.
• Work alongside software engineering teams to pinpoint and address infrastructural bottlenecks, crafting innovative solutions that improve product deployment processes.
• Oversee and enhance monitoring infrastructure with industry-standard tools to ensure thorough visibility across all systems.
• Strategically plan, communicate, and carry out maintenance windows on production systems with minimal impact on service availability.
• Assess platform and infrastructural challenges with decisiveness and analytical rigor; liaise with third-party vendors and support teams as necessary.
• Implement new systems and updates in a staged, risk-managed approach, ensuring safe and incremental rollouts.
• Research and adopt best practices in infrastructure and platform management to uphold secure, scalable, and fault-tolerant systems.
• Examine the design and implementation details of open-source systems to improve troubleshooting capabilities and expedite issue resolution.
• Collaborate transparently with stakeholders to convey system status, planned maintenance, and infrastructure enhancements.
• Bachelor's degree in Computer Science, Engineering, or equivalent professional experience (5+ years in a related infrastructure or systems role).
• Proficient in one or more programming languages: Go, Python, or bash shell scripting, with the capability to implement medium-complexity automation workflows.
• Strong understanding of Linux or UNIX from both administrative and debugging perspectives.
• Practical experience in operating software systems, infrastructure, and complex applications at scale in production environments.
• Proven expertise in infrastructure-as-code principles and practices.
• Strong problem-solving and software troubleshooting abilities, with a methodical and analytical approach.
• Experience in server provisioning, particularly related to storage and networking.
• Demonstrated capability to collaborate within cross-functional teams and convey technical concepts effectively.
• Familiarity with incident response, postmortem analysis, and continuous improvement methodologies.
• Remote work options.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.