This is a fully remote position, open to applicants in Czechia.

📋 Description

• Optimizing the performance of GPU clusters and InfiniBand networks to guarantee peak efficiency in HPC and GPU-centric settings.

• Investigating and diagnosing the root causes of issues associated with GPUs and InfiniBand networks, while recommending corrective measures.

• Incorporating new hardware into the current infrastructure, which includes supporting new GPU hardware via software frameworks like Kubernetes, QEMU, and KVM.

• Improving automation systems for proactive monitoring, identifying, and resolving issues within GPU and InfiniBand environments.

• Configuring and overseeing GPU devices and InfiniBand fabrics to ensure efficient and dependable performance.

⛳️ Requirements

• Over 5 years of professional experience in system-level software development with a focus on performance optimization and low-level programming.

• More than 3 years of practical experience with Linux systems, including administration, troubleshooting, and performance enhancement.

• Comprehensive understanding of server architecture, particularly PCIe devices, NICs, the Linux OS/Kernel, and high-performance computing (HPC) systems.

• Strong expertise in one or more performance-centric programming languages such as C/C++, Go, or Python.

🏝️ Benefits

• Competitive salary along with a comprehensive benefits package.

• Opportunities for professional advancement within Nebius.

• Flexible working arrangements available.

• A vibrant and collaborative work atmosphere that encourages initiative and innovation.

Senior HPC Cluster Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Forward Deployed Engineer

Professional Services Engineer

Senior Cisco CUCM Engineer

Ingeniero de Observabilidad IA

Field Services Engineer

Technical Services Engineer

Never miss a great job!