This is a fully remote position, open to applicants in Ukraine.

• Construct and manage production-quality model serving infrastructure utilizing frameworks such as vLLM, TGI, Triton, or similar.

• Create and execute reliable deployment pipelines featuring blue/green and canary rollout strategies for machine learning models.

• Design and uphold auto-scaling systems, multi-model serving architectures, and smart request routing layers.

• Enhance GPU utilization, memory efficiency, network throughput, and the performance of model artifact storage.

• Develop observability systems to monitor inference latency, throughput, GPU usage, cost metrics, and overall system health.

• Oversee model registries and CI/CD pipelines to facilitate automated and reproducible model deployments.

• Manage the complete lifecycle of machine learning systems from development to production, including operational support and on-call duties.

• Establish engineering best practices and contribute to platform scalability within a dynamic startup setting.

• A minimum of 4 years of experience in ML Ops, Platform Engineering, SRE, or related infrastructure roles centered on machine learning systems.

• Practical experience with model serving frameworks such as vLLM, TGI, Triton, or similar.

• Solid background in container orchestration and managing GPU-based workloads in a production environment.

• Familiarity with MLOps tools, including model registries, experiment tracking, and automated deployment pipelines.

• Proficient in Python and infrastructure-as-code tools (e.g., Terraform, Helm, or similar).

• Strong grasp of distributed systems, performance optimization, and production reliability engineering.

• Capability to effectively utilize AI coding assistants to enhance development and debugging processes.

• Ownership mentality with the ability to work autonomously in a remote-first setup.

• Experience with machine learning platforms like Kubeflow, MLflow, or KubeAI (preferred).

• Understanding of GPU scheduling, CUDA/ROCm optimization, or multi-tenant inference systems (preferred).

• Background in cost optimization across various GPU types and inference workloads (preferred).

• Experience in early-stage startups or greenfield infrastructure initiatives (preferred).

• Proven track record in building production systems from the ground up rather than maintaining legacy platforms (preferred).

• Take charge of essential infrastructure that powers a rapidly growing AI-native cloud platform.

• Build foundational ML inference systems from scratch within a high-growth, well-funded startup.

• Operate at the intersection of distributed systems, GPU computing, and sustainable cloud architecture.

• Acquire in-depth expertise in next-generation AI infrastructure and large-scale model serving systems.

• Impact core engineering decisions and establish best practices that will scale alongside the company.

AI Infrastructure Engineer – GPU

People also viewed