
ML Ops Engineer
Posted May 19

Posted May 19
This is a fully remote position, open to applicants in Ukraine.
• Construct and manage production-level model serving infrastructure utilizing frameworks such as vLLM, TGI, Triton, or similar alternatives.
• Create and implement reliable deployment pipelines featuring blue/green and canary rollout strategies for machine learning models.
• Develop and sustain auto-scaling systems, multi-model serving architectures, and smart request routing layers.
• Enhance GPU utilization, memory efficiency, network throughput, and model artifact storage performance.
• Design observability systems for monitoring inference latency, throughput, GPU usage, cost metrics, and overall system health.
• Oversee model registries and CI/CD pipelines to facilitate automated and reproducible model deployments.
• Manage the complete lifecycle of machine learning systems from development to production, including operational support and on-call duties.
• Establish engineering best practices and contribute to the scalability of the platform within a dynamic startup setting.
• A minimum of 4 years of experience in ML Ops, Platform Engineering, SRE, or comparable infrastructure roles with a focus on machine learning systems.
• Practical experience with model serving frameworks such as vLLM, TGI, Triton, or similar.
• Strong expertise in container orchestration and managing GPU-based workloads in a production environment.
• Familiarity with MLOps tools, including model registries, experiment tracking, and automated deployment pipelines.
• Proficient in Python and infrastructure-as-code tools (e.g., Terraform, Helm, or similar).
• Solid understanding of distributed systems, performance optimization, and production reliability engineering.
• Capability to effectively utilize AI coding assistants to enhance development and debugging processes.
• Ownership mindset with the capacity to work independently in a remote-first setting.
• Take charge of essential infrastructure supporting a rapidly expanding AI-native cloud platform.
• Build foundational ML inference systems from the ground up in a high-growth, well-funded startup environment.
• Work at the crossroads of distributed systems, GPU computing, and sustainable cloud architecture.
• Acquire deep knowledge in next-generation AI infrastructure and large-scale model serving systems.
• Influence key engineering decisions and establish best practices that will scale alongside the company.
Hyatt
Scopic
Perform
Greenlight Planet
Get handpicked remote jobs straight to your inbox weekly.