This is a fully remote position, open to applicants in India.

• Design and implement cutting-edge model serving architectures that achieve high throughput and low latency while optimizing memory utilization.

• Ensure that these pipelines operate efficiently across various environments.

• Set clear performance benchmarks such as decreased latency, enhanced token response, and reduced memory usage.

• Construct, execute, and oversee controlled inference tests in both simulated and live production settings.

• Monitor key performance metrics such as response latency, throughput, memory usage, and error rates.

• Document iterative findings and compare results against predefined benchmarks.

• Identify and curate high-quality test datasets and simulation scenarios.

• Evaluate computational efficiency and troubleshoot bottlenecks within the serving pipeline.

• Collaborate closely with cross-functional teams to integrate optimized serving and inference frameworks into production workflows.

• A degree in Computer Science or a related discipline.

• Preferably a PhD in NLP, Machine Learning, or a related area, backed by a strong record in AI R&D (with notable publications in A* conferences).

• Proficiency in Metal Shading Language (MSL) is required.

• Significant experience in low-level kernel optimizations and inference optimization on mobile devices is crucial.

• A thorough comprehension of modern model serving architectures and inference optimization methodologies.

• Strong expertise in writing GPU kernels for mobile devices (i.e., smartphones) is essential.

• Hands-on experience in developing and deploying complete end-to-end inference pipelines.

• Proven ability to leverage empirical research to tackle challenges in model serving.

• Experience with Distributed Inference Systems: Designing and enhancing high-performance inference engines.

• Opportunities for professional growth and development.

• Flexibility to work remotely from any location around the globe.

AI Research Engineer – Kernel & Inference Optimization

People also viewed