This is a fully remote position, open to applicants in Netherlands.

📋 Description

• Optimizing the performance of clusters and InfiniBand networks to guarantee peak functionality in HPC and GPU-centric environments.

• Investigating and diagnosing the underlying causes of issues pertaining to GPUs and InfiniBand networks, and recommending corrective measures.

• Incorporating new hardware into the current infrastructure, including enabling support for new GPU hardware via software stacks such as Kubernetes, QEMU, and KVM.

• Advancing automation systems for proactive monitoring, identifying, and resolving complications in GPU and InfiniBand settings.

• Setting up and overseeing GPU devices and InfiniBand fabrics to ensure effective and dependable operation.

⛳️ Requirements

• Over 5 years of professional experience in system-level software development, emphasizing performance optimization and low-level programming.

• More than 3 years of practical experience with Linux systems, including administration, troubleshooting, and/or performance tuning.

• Proficient with essential tools for kernel profiling and tuning, including perf, ftrace, and (e)BPF.

• Comprehensive knowledge of server architecture, encompassing PCIe devices, NICs, Linux OS/Kernel, etc.

• Strong command of one or more performance-focused programming languages such as C/C++, Go, or Python.

• It would be advantageous (though not essential) if you possess:

• Experience in GPU end-to-end testing within a cluster setup utilizing InfiniBand networking.

• A proven history of analyzing and enhancing the performance of HPC workloads, including simulations, data analysis, and AI/ML tasks.

• Familiarity with RDMA, RoCE, and InfiniBand protocols for high-performance communication.

• Background in Software-Defined Networking (SDN) along with experience in HPC cluster networking.

• Understanding of QEMU/KVM virtualization and management of virtualized environments.

• Experience with deep learning frameworks like PyTorch and TensorFlow, and their integration into HPC systems.

• Knowledge of collective communication libraries such as MPI and NCCL for distributed computing.

🏝️ Benefits

• Flexible working arrangements

• A dynamic and collaborative work environment that encourages initiative and innovation.

Senior Linux Kernel Engineer – High-Performance Computing

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Staff Engineer – API & Data

Senior AI Product Engineer

Full-Stack Engineering Lead

Full Stack Developer

Senior Software Engineer

Senior Software Engineer – Knowledge Graph, GraphRAG

Never miss a great job!