
Senior Software Engineer, AI Inference Systems
Posted May 2

Posted May 2
• Contribute features to vLLM that enhance the latest models with cutting-edge NVIDIA GPU hardware capabilities; profile and optimize the inference framework (vLLM) utilizing techniques such as speculative decoding, data/tensor/expert/pipeline-parallelism, and prefill-decode disaggregation.
• Develop, refine, and benchmark GPU kernels (both hand-tuned and compiler-generated) employing strategies like fusion, autotuning, and memory/layout optimization; create and expand high-level DSLs and compiler infrastructure to improve kernel developer productivity while nearing peak hardware utilization.
• Define and establish inference benchmarking methodologies and tools; contribute new benchmarks as well as NVIDIA's submissions to the leading MLPerf Inference benchmarking suite.
• Design the scheduling and orchestration of containerized large-scale inference deployments on GPU clusters across various cloud environments.
• Conduct and publish original research that advances the pareto frontier in the domain of ML Systems; survey recent publications and identify ways to integrate research concepts and prototypes into NVIDIA’s software offerings.
• Bachelor’s degree (or equivalent experience) in Computer Science (CS), Computer Engineering (CE), or Software Engineering (SE) with 7+ years of experience; alternatively, a Master’s degree in CS/CE/SE with 5+ years of experience; or a PhD degree with a thesis and publications in top-tier venues related to ML Systems, GPU architecture, or high-performance computing.
• Strong programming expertise in Python and C/C++; experience with Go or Rust is advantageous; solid understanding of computer science fundamentals: algorithms & data structures, operating systems, computer architecture, parallel programming, distributed systems, and deep learning theories.
• Knowledgeable and enthusiastic about performance engineering in ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM and SGLang).
• Familiarity with GPU programming and performance aspects: CUDA, memory hierarchy, streams, NCCL; skilled in using profiling/debugging tools (e.g., Nsight Systems/Compute).
• Experience with containers and orchestration (Docker, Kubernetes, Slurm); understanding of Linux namespaces and cgroups.
• Exceptional debugging, problem-solving, and communication abilities; proficient in thriving within a fast-paced, multi-functional environment.
• Competitive salary and performance-based bonuses.
• Comprehensive health, dental, and vision insurance.
• Generous paid time off and holiday leave.
• Opportunities for professional development and career growth.
Smartsheet
Smartsheet
Domus Global
PSI CRO AG
Get handpicked remote jobs straight to your inbox weekly.