Remotery

AI Research Engineer – Kernel & Inference Optimization

Posted May 20

This is a fully remote position, open to applicants in India.

📋 Description

• Design and implement cutting-edge model serving architectures that achieve high throughput and low latency while optimizing memory utilization.

• Ensure that these pipelines operate efficiently across various environments.

• Set clear performance benchmarks such as decreased latency, enhanced token response, and reduced memory usage.

• Construct, execute, and oversee controlled inference tests in both simulated and live production settings.

• Monitor key performance metrics such as response latency, throughput, memory usage, and error rates.

• Document iterative findings and compare results against predefined benchmarks.

• Identify and curate high-quality test datasets and simulation scenarios.

• Evaluate computational efficiency and troubleshoot bottlenecks within the serving pipeline.

• Collaborate closely with cross-functional teams to integrate optimized serving and inference frameworks into production workflows.


⛳️ Requirements

• A degree in Computer Science or a related discipline.

• Preferably a PhD in NLP, Machine Learning, or a related area, backed by a strong record in AI R&D (with notable publications in A* conferences).

• Proficiency in Metal Shading Language (MSL) is required.

• Significant experience in low-level kernel optimizations and inference optimization on mobile devices is crucial.

• A thorough comprehension of modern model serving architectures and inference optimization methodologies.

• Strong expertise in writing GPU kernels for mobile devices (i.e., smartphones) is essential.

• Hands-on experience in developing and deploying complete end-to-end inference pipelines.

• Proven ability to leverage empirical research to tackle challenges in model serving.

• Experience with Distributed Inference Systems: Designing and enhancing high-performance inference engines.


🏝️ Benefits

• Opportunities for professional growth and development.

• Flexibility to work remotely from any location around the globe.

People also viewed

Tether.to10 hours ago

AI Research Engineer, Model Compression – Quantization

CH flagSwitzerland OnlyFull-timeAI Research Scientist
ApplyView job
Insight Timer6 days ago

Clinical AI Research Lead

AU flagAustralia OnlyFull-timeAI Research Scientist
ApplyView job
Tether.to6 days ago

AI Research Engineer – Pre-training, LLM, Multi-Modal

CH flagSwitzerland OnlyFull-timeAI Research Scientist
ApplyView job
Insight Timer6 days ago

Clinical AI Research Assistant

AU flagAustralia OnlyFull-timeAI Research Scientist
ApplyView job
Nex6 days ago

ML Researcher

HK flagHong Kong OnlyFull-timeAI Research Scientist
ApplyView job
Toptal6 days ago

AI Researcher

AR flagArgentina OnlyFull-timeAI Research Scientist
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers