
Capacity Operations and Analytics Manager
Posted May 21

Posted May 21
This is a fully remote position, open to applicants in Brazil.
• Oversee and enhance GPU capacity along with other computing resources across multiple cloud service providers to accommodate increasing demands and ensure optimal utilization.
• Design, develop, and maintain data models, reporting systems, automation frameworks, dashboards, and performance metrics that underpin NVIDIA Infrastructure governance initiatives and strategic capacity planning.
• Assess the technical and business requirements for GPU capacity and other computing resources from a variety of internal and external stakeholders.
• Detect performance bottlenecks in the everyday usage of computing resources and collaborate with the appropriate infrastructure teams to address them.
• Propel initiatives aimed at improving infrastructure resource efficiency in collaboration with engineering, finance, and product teams.
• Create and enhance tools for our cloud infrastructure and analytics platform to maximize resource utilization and performance for NVIDIA and its clients.
• This involves designing and developing tools for workflow automation and potentially utilizing AI techniques to derive valuable signals and insights from generated data.
• Collaborate and cross-function with Finance, Product, Service Owners, and Infrastructure Engineering teams to align cloud capacity management with corporate objectives and establish Infrastructure and Service Level Key Performance Indicators (KPIs) that align with customer satisfaction.
• Spearhead multi-year budget-driven planning for computing resources in partnership with engineering.
• Bachelor's or Master's degree in Computer Science, Software Engineering, or a related discipline, or equivalent professional experience.
• Over 10 years of comprehensive experience in cloud computing, particularly in managing or sourcing GPU capacity with cloud service providers.
• A proven history of managing large-scale computing operations and planning is an advantage.
• Strong technical knowledge in cloud architecture, development and deployment, and handling large datasets.
• In-depth understanding of cloud service models (IaaS, PaaS, SaaS) and cloud infrastructure technologies.
• Required experience with Cloud Service Providers such as AWS, Azure, GCP, and OCI.
• Demonstrated experience in utilizing AI tools and techniques to derive valuable signals and insights from data, particularly for enhancing resource usage and automation.
• Solid understanding and practical application of statistical modeling and machine learning methodologies to boost operational efficiency and guide strategic capacity planning.
• Expertise in data analytics, visualization, and monitoring tools such as Kibana, Grafana, Splunk, Prometheus, Tableau, and Plotly.
• Familiarity with analytics, statistical modeling, and machine learning methodologies.
• Exceptional communication and interpersonal abilities, with a knack for effective collaboration across departments and influencing strategic decisions.
• Capability to function effectively in uncertain and rapidly changing business environments, demonstrating an agile mindset and a dedication to continuous improvement.
• Health insurance
• Professional development opportunities
Infomineo
Get handpicked remote jobs straight to your inbox weekly.