
Platform Support Engineer – Australian Capital Cities
Posted May 23

Posted May 23
This is a fully remote position, open to applicants in Australia.
• Monitor and observe platform health on a daily basis using Datadog, reviewing alerts, SLO burn rates, and system metrics at the beginning of each shift;
• Enhance SLO coverage and reporting, ensuring visibility into SLO status and proactively addressing violations;
• Adjust monitors and alert thresholds to optimize signal clarity while reducing noise, and document all findings and modifications;
• Detect and rectify observability gaps, instrumented failure modes, missing dashboards, and blind spots revealed through incidents and customer experience feedback;
• Create and maintain runbooks for failure modes and support workflows that arise during AUS hours;
• Conduct access management and onboarding/offboarding tasks across Cloudflare, Datadog, and Mailgun;
• Engage in the Platform team's tier-2 on-call rotation during AUS working hours, performing initial triage and initiating the appropriate escalation when necessary;
• Hold weekly syncs with the AUS Customer Experience team to identify observability gaps, runbook deficiencies, and provide advance notice of upcoming customer activities;
• Perform other duties as assigned.
• Bachelor's degree in Computer Science, Computer Engineering, Information Technology, or equivalent practical experience;
• Over 3 years of proven experience in Site Reliability Engineering, Platform Support, DevOps, or a role focused on observability within a SaaS environment;
• Practical experience with Datadog or similar observability platforms, including monitors, dashboards, logs, APM, and SLOs;
• Ability to recognize patterns in platform behavior and convert insights into actionable monitoring, documentation, and system enhancements;
• Demonstrated ability to exercise sound judgment in issue triage, differentiate between signal and noise, escalate when necessary, and communicate effectively under pressure;
• Knowledge of industry regulations and relevant operational guidelines; experience with Cloudflare, Mailgun, and/or similar operational tools is preferred;
• Familiarity with Azure infrastructure and fundamentals of cloud networking;
• Experience in a GitOps infrastructure-as-code environment;
• Prior experience with incident management frameworks and post-incident practices.
• Fully remote work environment;
• Company-provided computer equipment;
• Company contribution towards phone and internet expenses;
• 12% superannuation.
Attio
TechBiz Global
Get handpicked remote jobs straight to your inbox weekly.