
Senior HPC Performance Engineer
Posted Jun 3

Posted Jun 3
This is a fully remote position, open to applicants in Poland.
β’ Perform comprehensive performance characterization and analysis on extensive multi-GPU and multi-node clusters.
β’ Investigate the interaction of our libraries with all hardware (GPU, CPU, Networking) and software components in the ecosystem.
β’ Assess proof-of-concepts and carry out trade-off evaluations when multiple solutions are available.
β’ Diagnose and identify the root causes of performance issues reported by our clients.
β’ Gather substantial performance data and develop tools and infrastructure to visualize and analyze this information.
β’ Collaborate with a highly dynamic team across various time zones.
β’ M.S. (or equivalent experience) or Ph.D. in Computer Science or a related discipline.
β’ Over 3 years of experience in parallel programming and familiarity with at least one communication runtime (MPI, NCCL, UCX, NVSHMEM).
β’ Experience in conducting performance benchmarking and triaging on large-scale HPC clusters.
β’ Strong understanding of computer system architecture, hardware-software interactions, and operating system principles (i.e., systems software fundamentals).
β’ Ability to implement micro-benchmarks in C/C++ and modify the codebase as necessary.
β’ Capability to debug performance issues across the entire hardware/software stack.
β’ Proficient in a scripting language, preferably Python.
β’ Familiarity with containers, cloud provisioning, and scheduling tools (Kubernetes, SLURM, Ansible, Docker).
β’ Eagerness to learn new areas and tools, demonstrating adaptability.
β’ Flexibility to work and communicate effectively across diverse teams and time zones.
β’ Competitive salaries.
β’ Extensive benefits package.
β’ Work environment that fosters diversity, inclusion, and flexibility.
Akka (formerly Lightbend)
Swimlane
Get handpicked remote jobs straight to your inbox weekly.