This is a fully remote position, open to applicants in Texas.

📋 Description

• Take ownership of the product strategy, roadmap, and lifecycle for inference and model serving, encompassing serverless inference, dedicated endpoints, autoscaling, routing, KV cache management, and associated observability.

• Conduct deep technical explorations with NeoClouds, sovereign clouds, and enterprise platform teams, converting insights into prioritized requirements and architectural direction.

• Collaborate with engineering on system design trade-offs covering runtime integration, GPU scheduling, network, storage, and serving topology, which includes disaggregated serving and multi-model serving.

• Establish positioning based on measurable outcomes such as latency distributions, throughput per GPU, utilization, tail reliability, and cost per tokens.

• Drive go-to-market strategies, including pricing and packaging, reference architectures, sizing guides, PoC playbooks, and direct interaction with customers, analysts, and ecosystem partners.

⛳️ Requirements

• Over 7 years of experience in product management, technical product management, or a senior technical role responsible for AI/ML and inference product(s).

• In-depth understanding of production AI inference, encompassing model serving, serverless execution, dedicated endpoints, autoscaling, routing, workload placement, observability, and reliability.

• Proven ability to evaluate performance trade-offs across GPU, network, storage, orchestration, and runtime layers, and to translate low-level technical capabilities into business value indicators such as TTFT, throughput per GPU, and TCO.

• Familiarity with modern inference runtimes (vLLM, SGLang, TensorRT-LLM, Dynamo, Triton) and the essential optimization patterns in production: continuous batching, KV cache management, cold starts, prefill versus decode, disaggregated serving, and multi-model serving.

• Established credibility with engineering leaders and infrastructure operators, demonstrating comfort in production architecture reviews and technical discussions with platform engineering stakeholders.

🏝️ Benefits

• Join an established leader in cloud infrastructure from Silicon Valley.

• Collaborate with exceptionally passionate, talented, and engaging colleagues, assisting Fortune 500 and Global 2000 clients in implementing next-generation cloud technologies.

• Be part of innovative, cutting-edge open-source projects.

• Flourish in the dynamic environment of a young company that values openness, collaboration, risk-taking, and continuous growth.

• Opportunities for professional development and training.

• Attend conferences and working groups.

• Customized workstation options (macOS, Windows).

• Competitive compensation package complemented by a robust benefits plan and stock options.

Product Manager – AI Inference, Model Serving

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Rate Analyst

HSE Manager

People Partner

B2B Outside Sales Consultant

Business Development Executive, Early Career – European Language Required

Statistical Programmer II

Never miss a great job!