This is a fully remote position, open to applicants in North America.

📋 Description

• Lead the charge in innovating model serving and inference architectures for cutting-edge AI systems.

• Concentrate on enhancing model deployment and inference methodologies.

• Engage with a diverse range of systems, from efficient models to intricate, multi-modal architectures.

• Design, test, and execute innovative serving strategies and inference algorithms.

• Create robust inference pipelines, set performance benchmarks, and address bottlenecks within production settings.

• Facilitate high-throughput, low-latency, low-memory footprint, and scalable AI performance that yields significant value.

⛳️ Requirements

• A degree in Computer Science or a related discipline.

• Preferably a PhD in NLP, Machine Learning, or a comparable field, bolstered by a strong history in AI research and development (with notable publications in A* conferences).

• Familiarity with Metal Shading Language (MSL) is essential.

• Demonstrated experience in low-level kernel optimizations and inference enhancements on mobile devices is crucial.

• Your work should have resulted in quantifiable improvements in inference latency, throughput, and memory usage for specialized applications, especially on devices with limited resources and edge platforms.

• A thorough understanding of contemporary model serving architectures and inference optimization strategies is required.

• Strong proficiency in developing GPU kernels for mobile devices (e.g., smartphones).

• Hands-on experience in creating and deploying comprehensive inference pipelines, from model optimization for efficient serving to integrating these solutions on resource-limited devices is necessary.

• Proven ability to leverage empirical research to tackle challenges in model serving, including latency reduction, computational bottlenecks, and memory limitations.

• Skilled in designing solid evaluation frameworks and refining optimization strategies to constantly advance inference performance and system efficiency.

• Experience in Distributed Inference Systems: Designing and refining high-performance inference engines using techniques such as Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism to manage large models on GPU clusters.

• Profound understanding of the mathematics and framework behind Diffusion Models and Vision Transformers.

🏝️ Benefits

• Health insurance

• Flexible working hours

• Paid time off

• Professional development opportunities

AI Research Engineer – Kernel & Inference Optimization

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

AI Research Engineer, Model Compression – Quantization

Clinical AI Research Lead

AI Research Engineer – Pre-training, LLM, Multi-Modal

Clinical AI Research Assistant

ML Researcher

AI Researcher

Never miss a great job!