📋 Description

• Contribute features to vLLM that enhance the latest models with cutting-edge NVIDIA GPU hardware capabilities; profile and optimize the inference framework (vLLM) utilizing techniques such as speculative decoding, data/tensor/expert/pipeline-parallelism, and prefill-decode disaggregation.

• Develop, refine, and benchmark GPU kernels (both hand-tuned and compiler-generated) employing strategies like fusion, autotuning, and memory/layout optimization; create and expand high-level DSLs and compiler infrastructure to improve kernel developer productivity while nearing peak hardware utilization.

• Define and establish inference benchmarking methodologies and tools; contribute new benchmarks as well as NVIDIA's submissions to the leading MLPerf Inference benchmarking suite.

• Design the scheduling and orchestration of containerized large-scale inference deployments on GPU clusters across various cloud environments.

• Conduct and publish original research that advances the pareto frontier in the domain of ML Systems; survey recent publications and identify ways to integrate research concepts and prototypes into NVIDIA’s software offerings.

⛳️ Requirements

• Bachelor’s degree (or equivalent experience) in Computer Science (CS), Computer Engineering (CE), or Software Engineering (SE) with 7+ years of experience; alternatively, a Master’s degree in CS/CE/SE with 5+ years of experience; or a PhD degree with a thesis and publications in top-tier venues related to ML Systems, GPU architecture, or high-performance computing.

• Strong programming expertise in Python and C/C++; experience with Go or Rust is advantageous; solid understanding of computer science fundamentals: algorithms & data structures, operating systems, computer architecture, parallel programming, distributed systems, and deep learning theories.

• Knowledgeable and enthusiastic about performance engineering in ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM and SGLang).

• Familiarity with GPU programming and performance aspects: CUDA, memory hierarchy, streams, NCCL; skilled in using profiling/debugging tools (e.g., Nsight Systems/Compute).

• Experience with containers and orchestration (Docker, Kubernetes, Slurm); understanding of Linux namespaces and cgroups.

• Exceptional debugging, problem-solving, and communication abilities; proficient in thriving within a fast-paced, multi-functional environment.

🏝️ Benefits

• Competitive salary and performance-based bonuses.

• Comprehensive health, dental, and vision insurance.

• Generous paid time off and holiday leave.

• Opportunities for professional development and career growth.

Senior Software Engineer, AI Inference Systems

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Senior Software Engineer – Remote Eligible from Bulgaria

Senior Software Engineer – Remote Eligible, Bulgaria

Fullstack Developer – AI, Product

Full Stack Developer, AI Solutions

Senior Software Engineer – Full Stack

Senior Software Engineer

Never miss a great job!