Remotery

Senior Machine Learning Engineer – Inference Platform

Posted 1 day ago

This is a fully remote position, open to applicants in United States.

📋 Description

• Take ownership of and enhance our multi-engine inference platform, accommodating various model types and serving needs.

• Develop and optimize production ML pipelines — transitioning models from experimentation to dependable, high-throughput serving.

• Establish and execute strategies for model versioning, rollout, rollback, and lifecycle management to ensure reproducibility and operational dependability.

• Set and uphold serving-layer SLAs, covering latency, availability, GPU utilization, Time-to-First-Token (TTFT), and Inter-Token Latency (ITL).

• Create observability, monitoring, alerting, and operational tools for production inference systems.

• Implement software engineering best practices, including testing, CI/CD integration, and reproducibility across ML workflows.

• Enhance inference performance through effective resource utilization, hardware-aware serving strategies, and cost-efficient infrastructure design.

• Guarantee that ML serving systems are secure, scalable, and resilient in operations.

• Collaborate with ML, Data, Product, and DevOps teams to transform concepts into production systems, influencing technical decisions related to serving and scaling.


⛳️ Requirements

• Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience.

• 5–8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering, with direct responsibility for production ML serving systems.

• Practical experience managing an LLM serving engine (vLLM, TGI, TensorRT-LLM, or SGLang) in production under real load — not merely managed or hosted endpoints.

• Proficient in Python and possess strong software engineering fundamentals, along with extensive knowledge of systems and infrastructure.

• Familiarity with cloud platforms such as AWS, GCP, or Azure, and experience with ML lifecycle tools, experimentation platforms, and model registries.

• Strong understanding of inference performance — including continuous batching, KV-cache and GPU-memory behavior, quantization, and CPU versus GPU bottlenecks — with a tendency to profile before optimizing.

• Experience managing heterogeneous workloads, including LLMs, embedding models, and extraction models, each with unique latency, throughput, and scaling demands.

• Proven ability to balance latency, throughput, reliability, and infrastructure costs while managing production-scale ML systems.

• Experience in high-growth startup settings and ability to thrive in rapidly changing technical environments.


🏝️ Benefits

• Health insurance

• Flexible work arrangements

• Professional development opportunities

People also viewed

Anchor Utility11 hours ago

Rate Analyst

US flagTexas OnlyFull-timeUncategorized
ApplyView job
Honeywell11 hours ago

HSE Manager

US flagNorth Carolina OnlyFull-timeUncategorized
ApplyView job
Cision France11 hours ago

People Partner

CA flagCanada OnlyFull-timeUncategorized$85k/year
ApplyView job
Navigate Power11 hours ago

B2B Outside Sales Consultant

US flagPennsylvania OnlyFreelanceUncategorized$50k – $250k/year
ApplyView job
TELUS11 hours ago

Business Development Executive, Early Career – European Language Required

GB flagUnited Kingdom OnlyFull-timeUncategorized
ApplyView job
Gilead Sciences11 hours ago

Statistical Programmer II

US flagUnited States OnlyFull-timeUncategorized$107.2k – $138.7k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers