
Senior Technical Product Manager, Observability
Posted 1 hour ago

Posted 1 hour ago
This is a fully remote position, open to applicants in United States.
• Take ownership of the comprehensive Observability Platform roadmap, encompassing telemetry ingestion, querying, visualization, alerting, and retention for large-scale GPU clusters and multi-tenant cloud environments.
• Establish Vultr's observability strategy across bare metal, virtual machines, Kubernetes, and managed services, ensuring alignment with the infrastructure roadmap, reliability objectives, and customer experiences.
• Lead the development of the customer-facing observability interface, which includes dashboards, APIs, telemetry pipelines, and topology-aware insights.
• Convert low-level signals from GPU, CPU, memory, storage, and network into actionable health views, alerts, and debugging workflows for customers.
• Collaborate closely with engineering on technical trade-offs involving metrics agents, collectors, data models, telemetry pipelines, APIs, and retention architecture.
• Create products for distributed AI environments by gaining insights into the behavior of training and inference workloads across nodes, clusters, schedulers, and network fabrics.
• Develop health models that enable customers to swiftly identify degraded nodes, performance anomalies, and cluster bottlenecks on a fleet scale.
• Ensure that new infrastructure and platform launches are designed with observability in mind through strong collaboration with compute, network, and platform teams.
• Stay updated on contemporary observability stacks and AI infrastructure trends, including the impact of GPU workloads on performance analysis, cost attribution, and operational workflows.
• Over 7 years of product management experience in cloud infrastructure, observability, monitoring, or developer platforms.
• Comprehensive understanding of observability and monitoring systems, including metrics, logging, tracing, alerting, and telemetry pipeline architecture.
• Proven experience in defining product strategy and roadmaps for platform or infrastructure products at scale.
• Strong technical background with the ability to engage with engineering on telemetry agents, data models, query engines, retention, and distributed systems.
• Experience with monitoring GPU, AI/ML, or HPC infrastructure, along with the unique observability challenges associated with training and inference workloads.
• Demonstrated history of delivering developer- and operator-facing products that have a measurable impact on reliability, time-to-detect, or operational efficiency.
• Experience collaborating with cross-functional teams (engineering, design, marketing, sales) in a dynamic environment.
• Exceptional written and verbal communication skills, capable of translating complex technical concepts for varied audiences.
• Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).
• 100% company-paid insurance premiums for employee medical, dental, and vision plans.
• 401(k) plan with a 100% match up to 4%, featuring immediate vesting.
• Annual Professional Development Reimbursement of $2,500.
• 11 Holidays + Paid Time Off Accrual + Rollover Plan.
• Commitment is essential at Vultr! Increased PTO at the 3-year and 10-year anniversary, plus a 1-month paid sabbatical every 5 years and an Anniversary Bonus each year.
• $500 stipend for remote office setup in the first year and $400 each subsequent year.
• Internet reimbursement of up to $75 per month.
• Gym membership reimbursement of up to $50 per month.
• Company-paid Wellable subscription.
Instacart
CLASP
Tailor
Get handpicked remote jobs straight to your inbox weekly.