
Staff Software Engineer – Grafana Cloud, k6
Posted 20 hours ago

Posted 20 hours ago
• Cultivate and enhance a robust culture of operational excellence by establishing standards and mentoring teams to take ownership of reliability and availability.
• Propel advanced DevOps/SRE methodologies, encompassing incident response and post-incident reviews, on-call preparedness, runbooks, alerting, observability, and release/change management.
• Develop reliability frameworks including SLIs/SLOs and error budgets, utilizing them to steer prioritization and engineering decisions.
• Offer insights into system health through transparent operational metrics and reliability reports.
• Assist teams in the design, development, evolution, and management of large-scale, distributed cloud infrastructures.
• Shape product and system strategy through design evaluations, architectural discussions, and collaborative efforts across teams.
• Disseminate knowledge through well-structured, high-quality documentation and technical communication—internally and, when suitable, externally—to empower teams in building and managing systems more efficiently.
• As the reliability framework advances, transition into broader application and product development leadership, contributing architectural and technical expertise beyond operations.
• Extensive experience with DevOps/SRE methodologies, including managing and evolving production systems at scale.
• Solid programming skills in a modern language (Python and Go are preferred, but prior experience is not mandatory).
• Experience in designing, constructing, and operating large-scale distributed systems.
• Strong grasp of reliability engineering principles (e.g., incident management, observability, and failure modes).
• Familiarity with test automation, encompassing performance and functional testing.
• Ability to influence engineering practices through effective technical communication, reviews, and collaboration.
• Excellent interpersonal skills and the capability to work efficiently across teams.
• Knowledge of contemporary software engineering processes and delivery methodologies.
• Self-motivated and comfortable working with a significant level of autonomy and uncertainty.
• Equity
• Bonus (if applicable)
• Global annual leave policy of 30 days per year
• 3 days reserved for Grafana Shutdown Days
SERBYTE servicios IT
SitusAMC
Київстар
Artera.net
Get handpicked remote jobs straight to your inbox weekly.