
Product Manager – AI Inference, Model Serving
Posted 2 days ago

Posted 2 days ago
This is a fully remote position, open to applicants in Texas.
• Take ownership of the product strategy, roadmap, and lifecycle for inference and model serving, encompassing serverless inference, dedicated endpoints, autoscaling, routing, KV cache management, and associated observability.
• Conduct deep technical explorations with NeoClouds, sovereign clouds, and enterprise platform teams, converting insights into prioritized requirements and architectural direction.
• Collaborate with engineering on system design trade-offs covering runtime integration, GPU scheduling, network, storage, and serving topology, which includes disaggregated serving and multi-model serving.
• Establish positioning based on measurable outcomes such as latency distributions, throughput per GPU, utilization, tail reliability, and cost per tokens.
• Drive go-to-market strategies, including pricing and packaging, reference architectures, sizing guides, PoC playbooks, and direct interaction with customers, analysts, and ecosystem partners.
• Over 7 years of experience in product management, technical product management, or a senior technical role responsible for AI/ML and inference product(s).
• In-depth understanding of production AI inference, encompassing model serving, serverless execution, dedicated endpoints, autoscaling, routing, workload placement, observability, and reliability.
• Proven ability to evaluate performance trade-offs across GPU, network, storage, orchestration, and runtime layers, and to translate low-level technical capabilities into business value indicators such as TTFT, throughput per GPU, and TCO.
• Familiarity with modern inference runtimes (vLLM, SGLang, TensorRT-LLM, Dynamo, Triton) and the essential optimization patterns in production: continuous batching, KV cache management, cold starts, prefill versus decode, disaggregated serving, and multi-model serving.
• Established credibility with engineering leaders and infrastructure operators, demonstrating comfort in production architecture reviews and technical discussions with platform engineering stakeholders.
• Join an established leader in cloud infrastructure from Silicon Valley.
• Collaborate with exceptionally passionate, talented, and engaging colleagues, assisting Fortune 500 and Global 2000 clients in implementing next-generation cloud technologies.
• Be part of innovative, cutting-edge open-source projects.
• Flourish in the dynamic environment of a young company that values openness, collaboration, risk-taking, and continuous growth.
• Opportunities for professional development and training.
• Attend conferences and working groups.
• Customized workstation options (macOS, Windows).
• Competitive compensation package complemented by a robust benefits plan and stock options.
Cision France
Navigate Power
Get handpicked remote jobs straight to your inbox weekly.