This is a fully remote position, open to applicants in Netherlands.

📋 Description

• Become part of a dynamic AI infrastructure team focused on large-scale machine learning workloads.

• Create and validate production-level distributed training and extensive inference architectures utilizing large GPU clusters.

• Troubleshoot, enhance, and expand machine learning workloads across multi-node GPU setups.

• Serve as a technical expert on GPU performance and networking.

• Work collaboratively with engineering, product, and research & development teams.

⛳️ Requirements

• Practical experience in designing and managing enterprise-level, production-quality, multi-node GPU workloads for training (model sizes of 7B and above) or inference.

• Strong expertise in distributed deep learning frameworks (such as PyTorch Distributed, DeepSpeed, etc.) on GPU clusters.

• Comprehensive knowledge of GPU architecture and interconnect technologies (H100/A100 class, NVLink, InfiniBand).

• Familiarity with Kubernetes or Slurm.

• Experience in performance optimization utilizing GPU profiling and monitoring tools.

🏝️ Benefits

• Health insurance.

• Flexible work arrangements.

• Opportunities for professional development.

SAP IS-U Solution Architect

Poland OnlyFull-timeSolutions Engineer

20 min ago

Apply

Remote20 min ago

Remote

Senior Solutions Engineer

Ireland OnlyFull-timeSolutions Engineer€80k – €120k/year

20 min ago

Apply

GitLab20 min ago

GitLab

Senior Solutions Architect, Global SI

India OnlyFull-timeSolutions Engineer

20 min ago

Apply

NVIDIA12 hours ago

NVIDIA

Senior Solutions Architect, Customer Success

United Arab Emirates (UAE) OnlyFull-timeSolutions Engineer

12 hours ago

Apply

phData12 hours ago

phData