
Senior Machine Learning Engineer – Inference Platform
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in United States.
• Take ownership of and enhance our multi-engine inference platform, accommodating various model types and serving needs.
• Develop and optimize production ML pipelines — transitioning models from experimentation to dependable, high-throughput serving.
• Establish and execute strategies for model versioning, rollout, rollback, and lifecycle management to ensure reproducibility and operational dependability.
• Set and uphold serving-layer SLAs, covering latency, availability, GPU utilization, Time-to-First-Token (TTFT), and Inter-Token Latency (ITL).
• Create observability, monitoring, alerting, and operational tools for production inference systems.
• Implement software engineering best practices, including testing, CI/CD integration, and reproducibility across ML workflows.
• Enhance inference performance through effective resource utilization, hardware-aware serving strategies, and cost-efficient infrastructure design.
• Guarantee that ML serving systems are secure, scalable, and resilient in operations.
• Collaborate with ML, Data, Product, and DevOps teams to transform concepts into production systems, influencing technical decisions related to serving and scaling.
• Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience.
• 5–8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering, with direct responsibility for production ML serving systems.
• Practical experience managing an LLM serving engine (vLLM, TGI, TensorRT-LLM, or SGLang) in production under real load — not merely managed or hosted endpoints.
• Proficient in Python and possess strong software engineering fundamentals, along with extensive knowledge of systems and infrastructure.
• Familiarity with cloud platforms such as AWS, GCP, or Azure, and experience with ML lifecycle tools, experimentation platforms, and model registries.
• Strong understanding of inference performance — including continuous batching, KV-cache and GPU-memory behavior, quantization, and CPU versus GPU bottlenecks — with a tendency to profile before optimizing.
• Experience managing heterogeneous workloads, including LLMs, embedding models, and extraction models, each with unique latency, throughput, and scaling demands.
• Proven ability to balance latency, throughput, reliability, and infrastructure costs while managing production-scale ML systems.
• Experience in high-growth startup settings and ability to thrive in rapidly changing technical environments.
• Health insurance
• Flexible work arrangements
• Professional development opportunities
Cision France
Navigate Power
Get handpicked remote jobs straight to your inbox weekly.