
AI Research Engineer – Kernel & Inference Optimization
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in India.
• Design and implement cutting-edge model serving architectures that achieve high throughput and low latency while optimizing memory utilization.
• Ensure that these pipelines operate efficiently across various environments.
• Set clear performance benchmarks such as decreased latency, enhanced token response, and reduced memory usage.
• Construct, execute, and oversee controlled inference tests in both simulated and live production settings.
• Monitor key performance metrics such as response latency, throughput, memory usage, and error rates.
• Document iterative findings and compare results against predefined benchmarks.
• Identify and curate high-quality test datasets and simulation scenarios.
• Evaluate computational efficiency and troubleshoot bottlenecks within the serving pipeline.
• Collaborate closely with cross-functional teams to integrate optimized serving and inference frameworks into production workflows.
• A degree in Computer Science or a related discipline.
• Preferably a PhD in NLP, Machine Learning, or a related area, backed by a strong record in AI R&D (with notable publications in A* conferences).
• Proficiency in Metal Shading Language (MSL) is required.
• Significant experience in low-level kernel optimizations and inference optimization on mobile devices is crucial.
• A thorough comprehension of modern model serving architectures and inference optimization methodologies.
• Strong expertise in writing GPU kernels for mobile devices (i.e., smartphones) is essential.
• Hands-on experience in developing and deploying complete end-to-end inference pipelines.
• Proven ability to leverage empirical research to tackle challenges in model serving.
• Experience with Distributed Inference Systems: Designing and enhancing high-performance inference engines.
• Opportunities for professional growth and development.
• Flexibility to work remotely from any location around the globe.
Tether.to
Insight Timer
Tether.to
Get handpicked remote jobs straight to your inbox weekly.