
AI Research Engineer, Model Compression, Quantization
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in Netherlands.
• Lead the charge in innovating model compression and efficient deployment for cutting-edge multimodal AI systems, which encompass large language models (LLMs) and vision-language models (VLMs).
• Minimize model size and computational expenses while ensuring accuracy remains intact.
• Utilize and enhance compression methodologies such as quantization, knowledge distillation, and pruning.
• Create resilient compression pipelines, define performance and fidelity metrics, and tackle bottlenecks in production inference.
• Provide scalable, low-memory, low-latency AI systems for edge devices that offer high fidelity and deliver significant real-world benefits.
• A degree in Computer Science or a related discipline.
• Preferably a PhD in NLP, Machine Learning, or a similar field, backed by a substantial history in AI research and development (with notable publications in A* conferences).
• Proficient in using PyTorch deep learning frameworks or equivalent alternatives.
• Practical experience with model quantization, including both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ).
• Research experience and practical skills in knowledge distillation for transforming large models into smaller, more efficient versions.
• Research experience and practical skills in model pruning for compressing large models into smaller, efficient structures.
• Strong comprehension of neural network architectures and training methodologies, including transformers (e.g., LLMs, VLMs), backpropagation, optimization, and fine-tuning techniques.
• Knowledge of C++ is advantageous (particularly for implementing low-level quantization kernels or inference optimizations).
• Health insurance
• Remote work options
• Professional development opportunities
Aledade, Inc.
Clariti
Geomagical Labs
Slingshot Aerospace
Get handpicked remote jobs straight to your inbox weekly.