
Staff Site Reliability Engineer – Observability, DevOps
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in Hungary.
• Design, construct, and manage observability platforms utilizing Grafana and Prometheus.
• Establish and uphold metrics standards, dashboards, alerts, and Service Level Objectives (SLOs).
• Enhance signal integrity by minimizing alert noise, adjusting thresholds, and refining runbooks.
• Assist in incident response by delivering actionable telemetry and conducting post-incident evaluations.
• Integrate metrics, logs, and traces across various distributed systems.
• Collaborate with engineering teams to ensure proper instrumentation of services.
• Automate the configuration of observability using infrastructure as code.
• Contribute to reliability enhancements through capacity planning and performance analysis.
• Extensive experience with Prometheus, including scraping, federation, recording rules, and alerting.
• Significant experience with Grafana, covering dashboards, alerting, templating, and role-based access control (RBAC).
• Strong fundamentals in Linux and networking.
• Experience managing observability stacks within Kubernetes environments.
• Proficiency in infrastructure as code, with a preference for Terraform.
• Familiarity with incident management and on-call procedures.
• Capability to troubleshoot production systems using metrics and logs.
• Nice to have:
• Background in logs and traces, such as Loki, Tempo, or OpenTelemetry.
• Experience operating large-scale or multi-cluster Kubernetes platforms.
• Familiarity with cloud platforms like GCP, AWS, or OCI.
• Exposure to Site Reliability Engineering (SRE) concepts, including error budgets and SLO-driven prioritization.
• Competitive salary and performance-based bonuses.
• Flexible working hours and remote work options.
• Professional development opportunities and support for certifications.
• Health and wellness benefits, including medical, dental, and vision insurance.
• A collaborative and inclusive work environment.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.