
GPU Compute, MLIR Engineer
Posted 2 days ago

Posted 2 days ago
This is a fully remote position, open to applicants in India.
• Develop and enhance GPU compute kernels aimed at OpenCL and Vulkan compute backends for high-performance AI/ML tasks.
• Create, construct, and expand MLIR dialects across various abstraction levels—including frontend dialects, graph-level IR, tensor IR (such as Linalg, Tensor, TOSA), and runtime/low-level dialects—to facilitate effective end-to-end model compilation.
• Implement and sustain MLIR-based compiler passes and transformations, encompassing tiling, fusion, bufferization, vectorization, and lowering pipelines directed at OpenCL and Vulkan GPU backends.
• Perform profiling and bottleneck assessments of compiled kernels utilizing GPU counters and vendor-specific profilers, and promote performance enhancements through compiler-level optimizations.
• Establish and uphold GPU runtime infrastructure for both OpenCL and Vulkan, which includes memory management, pipeline configuration, command buffer orchestration, and resource scheduling.
• Develop and enhance code generation pipelines, facilitating automated lowering from tensor IR through MLIR to proficient OpenCL and Vulkan GPU kernels.
• Implement performance-critical schedules—including tiling, loop fusion, parallelism, and caching strategies—within MLIR-based backends designed for OpenCL and Vulkan runtimes.
• Collaborate with framework teams to refine end-to-end model lowering for computer vision and LLM workloads utilizing MLIR compilation stacks.
• Design and create robust compiler and runtime components using modern C/C++, leveraging advanced programming paradigms for high-performance systems.
• Strong practical experience with the MLIR framework, including creating and extending custom dialects, writing compiler passes, and building comprehensive lowering pipelines.
• Extensive expertise across MLIR abstraction levels:
• - Frontend dialects – ingestion and representation of ML models (e.g., TOSA, StableHLO, ONNX-MLIR)
• - Graph-level IR – high-level operation fusion, shape inference, and graph transformations
• - Tensor IR level – structured operation representation utilizing Linalg, Tensor, and Vector dialects; tiling and fusion strategies
• - Runtime/low-level dialects – Bufferization, MemRef, SCF, GPU, and LLVM dialects for final code generation
• Strong practical experience in OpenCL programming, encompassing kernel development, memory model, work-group/work-item optimization, and OpenCL runtime administration.
• Solid comprehension of Vulkan compute programming, including descriptor management, compute pipelines, synchronization primitives, and Vulkan runtime internals.
• Profound understanding of GPU architecture, memory hierarchies, and asynchronous compute.
• Proficiency in C/C++ for system-level development.
• Experience with kernel profiling and bottleneck analysis on GPU platforms.
• Strong foundation in machine learning fundamentals, addressing both Computer Vision (CV) and Large Language Model (LLM) workloads.
• Competitive salary and performance-based bonuses.
• Comprehensive health, dental, and vision insurance.
• Opportunities for professional development and continuous learning.
• Flexible working hours and remote work options.
• A collaborative and innovative work environment.
Twilio
ControlPoint Technologies, Inc.
A.C.Coy Company
Get handpicked remote jobs straight to your inbox weekly.