
AI Infrastructure Engineer – GPU
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in Ukraine.
• Construct and manage production-quality model serving infrastructure utilizing frameworks such as vLLM, TGI, Triton, or similar.
• Create and execute reliable deployment pipelines featuring blue/green and canary rollout strategies for machine learning models.
• Design and uphold auto-scaling systems, multi-model serving architectures, and smart request routing layers.
• Enhance GPU utilization, memory efficiency, network throughput, and the performance of model artifact storage.
• Develop observability systems to monitor inference latency, throughput, GPU usage, cost metrics, and overall system health.
• Oversee model registries and CI/CD pipelines to facilitate automated and reproducible model deployments.
• Manage the complete lifecycle of machine learning systems from development to production, including operational support and on-call duties.
• Establish engineering best practices and contribute to platform scalability within a dynamic startup setting.
• A minimum of 4 years of experience in ML Ops, Platform Engineering, SRE, or related infrastructure roles centered on machine learning systems.
• Practical experience with model serving frameworks such as vLLM, TGI, Triton, or similar.
• Solid background in container orchestration and managing GPU-based workloads in a production environment.
• Familiarity with MLOps tools, including model registries, experiment tracking, and automated deployment pipelines.
• Proficient in Python and infrastructure-as-code tools (e.g., Terraform, Helm, or similar).
• Strong grasp of distributed systems, performance optimization, and production reliability engineering.
• Capability to effectively utilize AI coding assistants to enhance development and debugging processes.
• Ownership mentality with the ability to work autonomously in a remote-first setup.
• Experience with machine learning platforms like Kubeflow, MLflow, or KubeAI (preferred).
• Understanding of GPU scheduling, CUDA/ROCm optimization, or multi-tenant inference systems (preferred).
• Background in cost optimization across various GPU types and inference workloads (preferred).
• Experience in early-stage startups or greenfield infrastructure initiatives (preferred).
• Proven track record in building production systems from the ground up rather than maintaining legacy platforms (preferred).
• Take charge of essential infrastructure that powers a rapidly growing AI-native cloud platform.
• Build foundational ML inference systems from scratch within a high-growth, well-funded startup.
• Operate at the intersection of distributed systems, GPU computing, and sustainable cloud architecture.
• Acquire in-depth expertise in next-generation AI infrastructure and large-scale model serving systems.
• Impact core engineering decisions and establish best practices that will scale alongside the company.
Pagefreezer
Orro Group
Feldera
Webflow
Get handpicked remote jobs straight to your inbox weekly.