
AI Research Engineer – Model Compression, Quantization
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in Spain.
• Lead the development of innovative techniques for model compression and efficient deployment in advanced multimodal AI systems.
• Emphasize the reduction of model size and computational expenses while maintaining accuracy.
• Utilize and enhance compression methodologies such as quantization, knowledge distillation, and pruning.
• Construct resilient compression pipelines and set performance and fidelity benchmarks.
• Tackle production inference bottlenecks effectively.
• Provide scalable, low-memory, and low-latency AI solutions for edge devices (e.g., smartphones) that uphold high fidelity and real-world applicability.
• A degree in Computer Science or a related discipline.
• Preferably a PhD in NLP, Machine Learning, or a related area, backed by a strong history in AI research and development, including notable publications in A* conferences.
• Proficiency with PyTorch or similar deep learning frameworks.
• Practical experience in model quantization, encompassing both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ).
• Research and practical experience in knowledge distillation aimed at compressing large models into smaller, efficient variants.
• Research and practical experience in model pruning for the purpose of reducing large models to smaller, efficient forms.
• Strong grasp of neural network architectures and training methodologies, including transformers (e.g., LLMs, VLMs), backpropagation, optimization, and fine-tuning strategies.
• Familiarity with C++ is advantageous, particularly for implementing low-level quantization kernels or optimization of inference processes.
• Not specified
Tether.to
Insight Timer
Tether.to
Get handpicked remote jobs straight to your inbox weekly.