
AI Research Engineer, Model Compression & Quantization
Posted May 23

Posted May 23
This is a fully remote position, open to applicants in Italy.
• Lead the advancement of model serving and inference architectures for cutting-edge AI systems.
• Concentrate on enhancing model deployment and inference methodologies to achieve high-performance outcomes.
• Design and implement robust inference pipelines, setting up thorough performance metrics.
• Detect and address bottlenecks within production settings.
• Work collaboratively with cross-functional teams to incorporate optimized serving and inference frameworks into production workflows.
• A Bachelor's degree in Computer Science or a related discipline.
• Preferably a PhD in Natural Language Processing, Machine Learning, or a similar field, supported by a strong background in AI research and development (with notable publications in A* conferences).
• Must possess knowledge of Metal Shading Language (MSL).
• Essential experience in low-level kernel optimizations and inference enhancements on mobile devices.
• Your contributions should have resulted in quantifiable improvements in inference latency, throughput, and memory usage for domain-specific applications, especially on resource-limited devices and edge platforms.
• A comprehensive understanding of contemporary model serving architectures and inference optimization strategies is necessary.
• Must have significant expertise in crafting GPU kernels for mobile devices (e.g., smartphones).
• Required practical experience in creating and deploying end-to-end inference pipelines, from model optimization for effective serving to the integration of these solutions on resource-constrained devices.
• Competitive salary.
• Flexible working hours.
• Opportunities for professional development.
• Options for remote work.
Tether.to
Insight Timer
Get handpicked remote jobs straight to your inbox weekly.