
Senior System Software Engineer, NCCL – Partner Enablement
Posted May 25

Posted May 25
This is a fully remote position, open to applicants in Poland.
• Collaborate with our partners and customers to identify the root causes of functional and performance issues related to NCCL.
• Perform performance characterization and analysis of NCCL and deep learning applications on cutting-edge GPU clusters.
• Create tools and automation to diagnose issues on new systems and platforms, including cloud environments (Azure, AWS, GCP, etc.).
• Advise our customers and support teams on HPC knowledge and best practices for running applications on multi-node clusters.
• Document processes and facilitate trainings/webinars for NCCL.
• Work with internal teams across various time zones on networking, GPUs, storage, infrastructure, and support.
• B.S./M.S. degree in Computer Science/Computer Engineering or equivalent experience, with over 5 years of relevant experience.
• Proficiency in parallel programming and familiarity with at least one communication runtime (MPI, NCCL, UCX, NVSHMEM).
• Strong C/C++ programming abilities, including skills in debugging, profiling, code optimization, performance analysis, and test design.
• Experience collaborating with the engineering or academic research community in support of HPC or AI.
• Practical knowledge of high-performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, and congestion control.
• Expertise in Linux fundamentals and a scripting language, preferably Python.
• Familiarity with containers, cloud provisioning, and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible).
• Willingness to adapt and a strong desire to learn new areas and tools.
• Ability to work and communicate effectively across different teams and time zones.
• Highly competitive salaries.
• Comprehensive benefits package.
• Work environment that fosters diversity, inclusion, and flexibility.
Webedia
TechBiz Global
The Flex
Nodeworthy
Get handpicked remote jobs straight to your inbox weekly.