
Staff Software Engineer – Grafana Cloud, k6
Posted 18 hours ago

Posted 18 hours ago
• Cultivate and enhance a robust culture of operational excellence by establishing standards and mentoring teams to take ownership of reliability and availability.
• Implement advanced DevOps/SRE methodologies, including incident response and post-incident reviews, on-call preparedness, runbooks, alerting, observability, and management of releases and changes.
• Create reliability frameworks such as Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets, utilizing them to steer prioritization and engineering decisions.
• Offer insights into system health through transparent operational metrics and reliability reporting.
• Assist teams in the design, development, evolution, and maintenance of large-scale, distributed cloud infrastructures.
• Shape product and system trajectories through design evaluations, architectural discussions, and cross-team collaboration.
• Disseminate knowledge through clear, high-quality documentation and technical communication—internally and, when appropriate, externally—to empower teams in building and managing systems more efficiently.
• As the reliability foundation advances, expand into broader application and product development leadership, contributing architectural and technical expertise beyond operational concerns.
• Extensive experience with DevOps/SRE practices, including the operation and evolution of production systems at scale.
• Solid programming expertise in a modern language (Python and Go are preferred, but prior experience is not mandatory).
• Experience in designing, constructing, and managing large-scale distributed systems.
• Profound understanding of reliability engineering principles (e.g., incident management, observability, and failure modes).
• Experience with test automation, including performance and functional testing methodologies.
• Capability to influence engineering practices through effective technical communication, reviews, and collaboration.
• Strong interpersonal abilities and proficiency in collaborating across teams.
• Familiarity with contemporary software engineering processes and delivery methodologies.
• Self-motivated and comfortable working with a significant degree of autonomy and ambiguity.
• Equity
• Bonus (if applicable)
• 30 days of annual leave
• Grafana Shutdown Days to provide the team with opportunities to disconnect
Smartsheet
Smartsheet
Domus Global
PSI CRO AG
Get handpicked remote jobs straight to your inbox weekly.