Remotery

Staff Site Reliability Engineer – Observability, DevOps

Posted May 20

This is a fully remote position, open to applicants in Hungary.

📋 Description

• Design, construct, and manage observability platforms utilizing Grafana and Prometheus.

• Establish and uphold metrics standards, dashboards, alerts, and Service Level Objectives (SLOs).

• Enhance signal integrity by minimizing alert noise, adjusting thresholds, and refining runbooks.

• Assist in incident response by delivering actionable telemetry and conducting post-incident evaluations.

• Integrate metrics, logs, and traces across various distributed systems.

• Collaborate with engineering teams to ensure proper instrumentation of services.

• Automate the configuration of observability using infrastructure as code.

• Contribute to reliability enhancements through capacity planning and performance analysis.


⛳️ Requirements

• Extensive experience with Prometheus, including scraping, federation, recording rules, and alerting.

• Significant experience with Grafana, covering dashboards, alerting, templating, and role-based access control (RBAC).

• Strong fundamentals in Linux and networking.

• Experience managing observability stacks within Kubernetes environments.

• Proficiency in infrastructure as code, with a preference for Terraform.

• Familiarity with incident management and on-call procedures.

• Capability to troubleshoot production systems using metrics and logs.

• Nice to have:

• Background in logs and traces, such as Loki, Tempo, or OpenTelemetry.

• Experience operating large-scale or multi-cluster Kubernetes platforms.

• Familiarity with cloud platforms like GCP, AWS, or OCI.

• Exposure to Site Reliability Engineering (SRE) concepts, including error budgets and SLO-driven prioritization.


🏝️ Benefits

• Competitive salary and performance-based bonuses.

• Flexible working hours and remote work options.

• Professional development opportunities and support for certifications.

• Health and wellness benefits, including medical, dental, and vision insurance.

• A collaborative and inclusive work environment.

People also viewed

Advanced Solutions International, Inc.10 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone10 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers