Remotery

AI Infrastructure Operations Engineer

Posted 9 hours ago

This is a fully remote position, open to applicants in United States.

📋 Description

• Ensure operational reliability for Companion within the AKS infrastructure, AI agent workloads, monitoring systems, and deployment pipelines.

• Develop effective observability practices that enable PHM to comprehend platform behavior, usage patterns, and operational risks prior to incidents occurring.

• Establish sustainable operational hygiene practices involving patching, CVE remediation, secrets rotation, dependency management, and cloud maintenance cycles.

• Enhance platform resilience, documentation, and operational processes to facilitate scalability without dependence on tribal knowledge.

• Oversee and maintain AKS infrastructure, AI agent workloads, deployment pipelines, while supporting Azure services.

• Analyze incidents, troubleshoot production problems, and bolster platform resilience through improved operational patterns and tools.

• Assist in release operations to ensure that deployments are stable, observable, and recoverable.


⛳️ Requirements

• Extensive hands-on experience with Kubernetes operations, including troubleshooting workloads, admission controllers, cluster networking, and production incidents.

• Proficiency in supporting cloud-native infrastructure in Azure environments, specifically AKS and associated operational tools.

• Proven expertise in monitoring, observability, and incident response utilizing structured logging and metrics platforms.

• SRE mindset with experience managing on-call duties, prioritizing operations, and conducting post-incident reviews.

• Ability to operate in dynamic environments with incomplete documentation, evolving processes, and diverse ownership areas.

• Excellent communication and collaboration skills, capable of articulating technical issues clearly to both technical and non-technical audiences.


🏝️ Benefits

• Health, dental, and vision benefits

• Annual cash incentive program

• 401k with match

• Flexible PTO

• PHM for PHM — our services for you and your dependents

People also viewed

F59 hours ago

Senior Infrastructure Capacity Engineer

US flagCalifornia, +3 more statesFull-timeInfrastructure Engineer$161.6k – $242.4k/year
ApplyView job
Paragone Solutions, Inc.9 hours ago

IT Infrastructure & Security Engineer

US flagUnited States OnlyFull-timeInfrastructure Engineer
ApplyView job
CACI International Inc10 hours ago

Cloud Engineer – Virtual Desktop Infrastructure, VDI

US flagUnited States OnlyFull-timeInfrastructure Engineer$75.2k – $158.1k/year
ApplyView job
Nacre Capital10 hours ago

Principal / Staff Software Engineer – Backend, MLOps, Cloud Infrastructure

PT flagPortugal OnlyFull-timeInfrastructure Engineer
ApplyView job
Lightning AI12 hours ago

Infrastructure Operations Engineer

US flagCalifornia, +2 more statesFull-timeInfrastructure Engineer$160k – $200k/year
ApplyView job
CIVIE21 hours ago

Senior Azure Cloud Infrastructure Engineer – Healthcare AI Platform

US flagUnited States OnlyFull-timeInfrastructure Engineer
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers