
Senior Site Reliability Engineer
Posted May 9

Posted May 9
This is a fully remote position, open to applicants in Australia.
• Enhancing production reliability and system resilience within a team focused on Site Reliability Engineering (SRE)
• Advocating for high-quality work and adherence to industry best practices
• Engaging with teams and stakeholders throughout all project phases
• Introducing innovative ideas and fostering a culture of creativity
• Addressing complex technical challenges with a proactive attitude
• Collaborating across various technologies in a rapidly evolving industry
• Participating in on-call rotations, managing incident responses, and conducting blameless post-incident reviews
• Writing code, responding to alerts, enhancing solutions, and providing support to team members
• Contributing significantly to the success of both your team and the company
• Over 5 years of experience managing Linux systems and associated infrastructure within production settings
• A collaborative mindset typical of SRE professionals, with knowledge of SLIs, SLOs, SLAs, error budgets, blast radius, and blameless postmortems
• A commitment to automation, reducing repetitive tasks, and minimizing problem recurrence
• Proven experience in creating runbooks that benefit the entire team, not just individual users
• Solid understanding of Kubernetes and its broader ecosystem
• Experience with cloud infrastructure, preferably AWS; familiarity with bare-metal setups is a plus
• Proficiency in tool development using Bash, and either Python or Go, or similar languages
• Familiarity with Infrastructure-as-Code tools, with a preference for Terraform
• Experience with CI/CD processes and version control, preferably GitHub
• Database expertise in one of the following: Postgres, Cassandra, or ClickHouse
• Experience managing a production observability stack (metrics, logs, traces), with a focus on extracting meaningful insights
• Comfortable working with live production infrastructure, exhibiting strong troubleshooting skills and ownership during incident responses
• A history of ongoing professional development
• A self-motivated approach suited for an asynchronous, globally distributed team, with a readiness to take on additional tasks as needed
• Flexible working arrangements
• Birthday leave
• Generous funding for study and training, along with 5 days of paid study leave
• Creative, enjoyable, and modern work environments
• A driven team of industry professionals alongside emerging talent
• Recognized achievements through ‘Legend’ and ‘Kudos’ awards
• Comprehensive health and wellness programs
Investigo
Software Mind
Cherokee Federal
Avaya
Get handpicked remote jobs straight to your inbox weekly.