Remotery

Site Reliability Engineer – AI

Posted May 30

This is a fully remote position, open to applicants in Poland.

📋 Description

• Develop and sustain a centralized monitoring and alerting framework for AI applications and their pipelines.

• Establish and execute Service Level Indicators (SLIs), alerts, and operational dashboards.

• Oversee incident management, which includes triage, coordination, root cause analysis, and implementing preventive measures.

• Standardize telemetry across various systems, focusing on latency, throughput, and failure metrics.

• Enhance Continuous Integration and Continuous Deployment (CI/CD) pipelines by introducing quality gates to ensure reliability.

• Collaborate closely with engineering teams to minimize recurring issues and enhance overall system stability.


⛳️ Requirements

• At least 5 years of experience in Site Reliability Engineering (SRE), Platform Engineering, or Production Engineering.

• Extensive hands-on experience with Kubernetes in production settings.

• Proficiency with Azure and Azure DevOps.

• Familiarity with monitoring tools such as Datadog.

• Strong knowledge of incident management processes and root cause analysis.

• Capability to develop effective monitoring and alerting systems.

• Nice to have: Experience with AI or large language model (LLM) pipelines.

• Nice to have: Experience in constructing monitoring platforms across multiple systems.

• Nice to have: Familiarity with Grafana.

• Nice to have: Experience in large-scale or distributed environments.


🏝️ Benefits

• Competitive and attractive salary.

• Opportunity to work in a multinational setting on international projects.

• Comprehensive healthcare coverage.

• Long-term B2B contract with a stable pipeline of projects.

• Fully remote work model.

People also viewed

Advanced Solutions International, Inc.10 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone10 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers