This is a fully remote position, open to applicants in India.

📋 Description

• Develop and enhance GPU compute kernels aimed at OpenCL and Vulkan compute backends for high-performance AI/ML tasks.

• Create, construct, and expand MLIR dialects across various abstraction levels—including frontend dialects, graph-level IR, tensor IR (such as Linalg, Tensor, TOSA), and runtime/low-level dialects—to facilitate effective end-to-end model compilation.

• Implement and sustain MLIR-based compiler passes and transformations, encompassing tiling, fusion, bufferization, vectorization, and lowering pipelines directed at OpenCL and Vulkan GPU backends.

• Perform profiling and bottleneck assessments of compiled kernels utilizing GPU counters and vendor-specific profilers, and promote performance enhancements through compiler-level optimizations.

• Establish and uphold GPU runtime infrastructure for both OpenCL and Vulkan, which includes memory management, pipeline configuration, command buffer orchestration, and resource scheduling.

• Develop and enhance code generation pipelines, facilitating automated lowering from tensor IR through MLIR to proficient OpenCL and Vulkan GPU kernels.

• Implement performance-critical schedules—including tiling, loop fusion, parallelism, and caching strategies—within MLIR-based backends designed for OpenCL and Vulkan runtimes.

• Collaborate with framework teams to refine end-to-end model lowering for computer vision and LLM workloads utilizing MLIR compilation stacks.

• Design and create robust compiler and runtime components using modern C/C++, leveraging advanced programming paradigms for high-performance systems.

⛳️ Requirements

• Strong practical experience with the MLIR framework, including creating and extending custom dialects, writing compiler passes, and building comprehensive lowering pipelines.

• Extensive expertise across MLIR abstraction levels:

• - Frontend dialects – ingestion and representation of ML models (e.g., TOSA, StableHLO, ONNX-MLIR)

• - Graph-level IR – high-level operation fusion, shape inference, and graph transformations

• - Tensor IR level – structured operation representation utilizing Linalg, Tensor, and Vector dialects; tiling and fusion strategies

• - Runtime/low-level dialects – Bufferization, MemRef, SCF, GPU, and LLVM dialects for final code generation

• Strong practical experience in OpenCL programming, encompassing kernel development, memory model, work-group/work-item optimization, and OpenCL runtime administration.

• Solid comprehension of Vulkan compute programming, including descriptor management, compute pipelines, synchronization primitives, and Vulkan runtime internals.

• Profound understanding of GPU architecture, memory hierarchies, and asynchronous compute.

• Proficiency in C/C++ for system-level development.

• Experience with kernel profiling and bottleneck analysis on GPU platforms.

• Strong foundation in machine learning fundamentals, addressing both Computer Vision (CV) and Large Language Model (LLM) workloads.

🏝️ Benefits

• Competitive salary and performance-based bonuses.

• Comprehensive health, dental, and vision insurance.

• Opportunities for professional development and continuous learning.

• Flexible working hours and remote work options.

• A collaborative and innovative work environment.

GPU Compute, MLIR Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Forward Deployed Engineer

Consulting Engineer, Protection & Control

Windows Engineer

Interview Engineer

Facilities Engineer

Engenheiro

Never miss a great job!