
Staff Software Engineer – Grafana Cloud, k6
Posted 17 hours ago

Posted 17 hours ago
• Cultivate and enhance a robust culture of operational excellence by establishing standards and mentoring teams to take ownership of reliability and availability.
• Implement advanced DevOps/SRE methodologies, which encompass incident response, post-incident reviews, on-call preparedness, runbooks, alerting, observability, and management of releases and changes.
• Create reliability frameworks such as SLIs/SLOs and error budgets, utilizing them to inform prioritization and engineering decisions.
• Ensure transparency regarding system health through comprehensive operational metrics and reliability reports.
• Assist teams in the design, development, evolution, and maintenance of extensive, distributed cloud systems.
• Shape product and system trajectories through design assessments, architectural dialogues, and collaborative efforts across teams.
• Disseminate knowledge via clear, high-quality documentation and technical communication—internally and, when relevant, externally—to aid teams in effectively building and managing systems.
• As the reliability foundation progresses, expand into wider application and product development leadership roles, contributing architectural and technical expertise beyond operations.
• Extensive experience with DevOps/SRE practices, particularly in managing and evolving production systems at scale.
• Strong programming proficiency in a contemporary language (with Python and Go as primary languages, although prior experience is not mandatory).
• Experience in designing, constructing, and managing large-scale distributed systems.
• Solid understanding of reliability engineering principles (e.g., incident management, observability, and failure modes).
• Experience with test automation, covering both performance and functional testing.
• Ability to influence engineering practices through effective technical communication, reviews, and collaboration.
• Excellent interpersonal skills, enabling effective collaboration across teams.
• Familiarity with contemporary software engineering processes and delivery methodologies.
• Proactive and comfortable operating with a significant degree of autonomy and uncertainty.
• 100% Remote, Global Culture
• Scaling Organization
• Transparent Communication
• Innovation-Driven
• Open Source Roots
• Empowered Teams
• Career Growth Pathways
• Approachable Leadership
• Passionate People
• In-Person onboarding
• Balance is Key - 30 days annual leave
Smartsheet
Smartsheet
Domus Global
PSI CRO AG
Get handpicked remote jobs straight to your inbox weekly.