Remotery

AI Research Engineer, Model Compression, Quantization

Posted May 30

This is a fully remote position, open to applicants in Brazil.

📋 Description

• Implement low-bit quantization techniques aimed at decreasing model size and inference latency for generative AI models (LLMs, VLMs, multimodal), while preserving accuracy and output quality.

• Utilize knowledge distillation to transfer functionalities from larger teacher models to smaller student models, facilitating efficient multimodal reasoning across text, image, and audio inputs.

• Apply pruning methods to eliminate redundant parameters and attention heads, thus minimizing computational demands without compromising task performance.

• Evaluate the trade-offs between model efficiency (size, latency, memory) and accuracy across quantization, distillation, and pruning techniques, and suggest enhancements based on empirical data.

• Conduct research and implement mixed-precision quantization along with other advanced compression strategies (e.g., adaptive pruning schedules, distillation with intermediate feature matching) to optimize the balance between accuracy and performance.

• Stay up-to-date with the latest advancements in model compression, focusing on emerging techniques for multimodal and generative architectures.

• Clearly document methodologies, experiments, and results to ensure reproducibility, facilitate internal collaboration, and enhance stakeholder communication.

• Write and publish technical papers in prestigious conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ACL, AAAI) to contribute to the advancement of model compression in multimodal AI.


⛳️ Requirements

• A degree in Computer Science or a related discipline.

• Preferably a PhD in NLP, Machine Learning, or a related field, supported by a strong record in AI R&D (with notable publications in A* conferences).

• Proficiency in PyTorch deep learning frameworks or similar frameworks.

• Practical experience with model quantization, including both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ).

• Research and practical experience in knowledge distillation for compressing large models into smaller, more efficient versions.

• Research and practical experience in model pruning for reducing large models to smaller, efficient alternatives.

• Strong understanding of neural network architectures and training methodologies, including transformers (e.g., LLMs, VLMs), backpropagation, optimization, and fine-tuning techniques.

• Familiarity with C++ is advantageous (especially for implementing low-level quantization kernels or inference optimizations).


🏝️ Benefits

• Flexible working arrangements.

• Opportunities for professional development.

People also viewed

PlexTrac19 min ago

AI Research Engineer – Applied AI

IN flagIndia OnlyFull-timeAI Research Scientist
ApplyView job
Tether.to19 min ago

AI Research Engineer – Model Compression, Quantization

AE flagUnited Arab Emirates (UAE) OnlyFull-timeAI Research Scientist
ApplyView job
Tether.to19 min ago

AI Research Engineer – Agentic Post-training

Anywhere in the WorldFull-timeAI Research Scientist
ApplyView job
Tether.to12 hours ago

AI Research Engineer, Model Compression – Quantization

CH flagSwitzerland OnlyFull-timeAI Research Scientist
ApplyView job
Insight Timer6 days ago

Clinical AI Research Lead

AU flagAustralia OnlyFull-timeAI Research Scientist
ApplyView job
Toptal6 days ago

AI Researcher

AR flagArgentina OnlyFull-timeAI Research Scientist
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers