Remotery

Senior Site Reliability Engineer

Posted Jun 20

This is a fully remote position, open to applicants in United States.

📋 Description

• Contribute to system observability by implementing and enhancing metrics, alerting, and dashboards to gain better insights and achieve quicker recovery.

• Develop automation, tools, and monitoring solutions aimed at ensuring high service availability.

• Collaborate with application and quality engineering teams to adopt best practices in reliability, release automation, and testing.

• Promote operational excellence through proactive incident prevention, conducting blameless postmortems, and engaging in capacity planning.

• Take part in on-call rotations to support critical services and guarantee a swift response to incidents.


⛳️ Requirements

• Solid experience in Python, particularly for automation, tooling, and data-driven operational tasks.

• Proficiency in at least one programming language (Java, C++, or Go).

• Strong understanding of Linux systems, cloud infrastructure (AWS, GCP, or Azure), and contemporary deployment practices (Docker, Kubernetes, Terraform).

• Experience with CI/CD pipelines, version control, and automated testing frameworks.

• Familiarity with observability tools (e.g., Prometheus, Grafana, ELK, Datadog, etc.) and log/metric analysis for troubleshooting issues.

• Proven experience in facilitating and documenting Critical User Journeys, translating them into actionable SLA/SLO for automation.

• Demonstrated capability to work with cross-functional teams and communicate effectively in high-stakes situations.

• A problem-solver who views reliability as a collective responsibility within engineering.

• Familiarity with AI-augmented development tools (Claude, Codex) as part of a modern engineering workflow.

• **Nice to Have**

• Experience in writing or maintaining end-to-end or integration tests for distributed systems.

• Background in performance testing, capacity planning, or chaos engineering.

• Contributions to internal developer tools or reliability-focused frameworks.

• Exposure to security, compliance, or change management processes in production environments.

• Relevant certifications.


🏝️ Benefits

• Multiple medical insurance plans to select from.

• Dental, vision, life, and disability insurance.

• Employee Emergency Fund.

• Company equity (stock options).

• Open PTO policy.

• 401K plan with company matching.

• Hybrid/flexible work environment.

People also viewed

Innovative Solutions3 hours ago

Cloud Engineer – DevOps

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$100k – $160k/year
ApplyView job
Caspar Health3 hours ago

DevSecOps/DevOps Engineer

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
IVIX3 hours ago

Deployment Engineer

US flagNew York OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Investigo14 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind14 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal14 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers