This is a fully remote position, open to applicants in Australia.

• Enhancing production reliability and system resilience within a team focused on Site Reliability Engineering (SRE)

• Advocating for high-quality work and adherence to industry best practices

• Engaging with teams and stakeholders throughout all project phases

• Introducing innovative ideas and fostering a culture of creativity

• Addressing complex technical challenges with a proactive attitude

• Collaborating across various technologies in a rapidly evolving industry

• Participating in on-call rotations, managing incident responses, and conducting blameless post-incident reviews

• Writing code, responding to alerts, enhancing solutions, and providing support to team members

• Contributing significantly to the success of both your team and the company

• Over 5 years of experience managing Linux systems and associated infrastructure within production settings

• A collaborative mindset typical of SRE professionals, with knowledge of SLIs, SLOs, SLAs, error budgets, blast radius, and blameless postmortems

• A commitment to automation, reducing repetitive tasks, and minimizing problem recurrence

• Proven experience in creating runbooks that benefit the entire team, not just individual users

• Solid understanding of Kubernetes and its broader ecosystem

• Experience with cloud infrastructure, preferably AWS; familiarity with bare-metal setups is a plus

• Proficiency in tool development using Bash, and either Python or Go, or similar languages

• Familiarity with Infrastructure-as-Code tools, with a preference for Terraform

• Experience with CI/CD processes and version control, preferably GitHub

• Database expertise in one of the following: Postgres, Cassandra, or ClickHouse

• Experience managing a production observability stack (metrics, logs, traces), with a focus on extracting meaningful insights

• Comfortable working with live production infrastructure, exhibiting strong troubleshooting skills and ownership during incident responses

• A history of ongoing professional development

• A self-motivated approach suited for an asynchronous, globally distributed team, with a readiness to take on additional tasks as needed

• Flexible working arrangements

• Birthday leave

• Generous funding for study and training, along with 5 days of paid study leave

• Creative, enjoyable, and modern work environments

• A driven team of industry professionals alongside emerging talent

• Recognized achievements through ‘Legend’ and ‘Kudos’ awards

• Comprehensive health and wellness programs

Senior Site Reliability Engineer

People also viewed