
Senior Technical Product Manager, GPU Orchestration
Posted 22 hours ago

Posted 22 hours ago
• Develop and implement the strategic roadmap for managed Kubernetes, managed Slurm services, SUNK, and Run:ai integration.
• Take ownership of the complete cluster lifecycle, which includes provisioning, configuring, upgrading, scaling, ensuring high availability, and decommissioning.
• Create scheduling and resource management functionalities for GPU workloads, encompassing quotas, fair-share policies, multi-tenant isolation, and priority management.
• Facilitate integration among orchestration services and essential infrastructure components such as networking, storage, identity, observability, and billing systems.
• Establish service-level objectives regarding control plane reliability, job scheduling latency, cluster availability, and upgrade stability.
• Design APIs, CLI tools, and UI workflows that empower self-service cluster management and workload operations.
• Collaborate with customer-facing teams to understand training, inference, and HPC use cases, converting actual workload requirements into product functionalities.
• Keep abreast of industry trends in container orchestration, HPC scheduling, distributed systems, and AI infrastructure to guide product development.
• Over 7 years of product management experience in cloud infrastructure, container orchestration, HPC, or developer platforms.
• Profound understanding of Kubernetes, Slurm, or comparable orchestration and scheduling systems, including GPU scheduling, resource management, and multi-tenant isolation.
• Proven experience in defining product strategies and roadmaps for platform or infrastructure products at scale.
• Strong technical expertise — capable of engaging with engineering teams on cluster lifecycle, control plane reliability, API design, and distributed systems.
• Experience with AI/ML infrastructure, covering training workloads, inference serving, and GPU resource optimization.
• A history of successfully delivering developer- and operator-facing products that have measurable impacts on reliability, adoption, or operational efficiency.
• Proven ability to work collaboratively across cross-functional teams (engineering, design, marketing, sales) in a dynamic environment.
• Exceptional written and verbal communication skills, with the capability to convey complex technical ideas to varied audiences.
• Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).
• 100% company-covered insurance premiums for employee medical, dental, and vision plans.
• 401(k) plan that matches 100% up to 4%, with immediate vesting.
• Professional Development Reimbursement of $2,500 annually.
• 11 Holidays + Paid Time Off Accrual + Rollover Plan.
• Commitment is key at Vultr! Gain increased PTO at your 3-year and 10-year anniversary, plus a 1-month paid sabbatical every 5 years, along with an Anniversary Bonus each year.
• $500 stipend for remote office setup in the first year, followed by $400 each subsequent year.
• Internet reimbursement of up to $75 per month.
• Gym membership reimbursement of up to $50 per month.
• Company-paid Wellable subscription.
Seekerh
ButterCMS 🧈
GE Aerospace
GE Aerospace
Get handpicked remote jobs straight to your inbox weekly.