This is a fully remote position, open to applicants in India.

📋 Description

• Perform comprehensive research and engineering on vision-language models, encompassing training, evaluation, and optimization throughout the entire model development lifecycle.

• Create and execute post-training pipelines that include supervised fine-tuning, knowledge distillation, and reinforcement learning based on human feedback.

• Develop and uphold high-quality multimodal datasets, which involve data curation, filtering, and balancing for specific domain tasks.

• Enhance model efficiency and deployability by adapting models for environments with limited resources through compression and optimization strategies.

• Design and implement evaluation frameworks and benchmarks to assess model performance, robustness, and success in real-world tasks.

• Construct and scale training workflows across distributed GPU infrastructures.

• Identify and mitigate bottlenecks in training pipelines to attain state-of-the-art model quality on targeted benchmarks.

• Contribute to and utilize open-source ecosystems, including models, datasets, and tools, to expedite development.

• Keep abreast of the latest advancements in multimodal learning and vision-language systems, applying relevant findings to achieve practical enhancements.

• Publish research outcomes in leading AI conferences and journals when applicable.

⛳️ Requirements

• Bachelor’s degree in Computer Science, Machine Learning, or a related discipline; MS/PhD is preferred.

• Extensive experience with multimodal post-training workflows, including supervised fine-tuning, knowledge distillation, and reinforcement learning from feedback.

• Practical experience with parameter-efficient fine-tuning and distributed training frameworks.

• Proven ability to build and enhance vision-language models with measurable outcomes on established benchmarks or real-world applications.

• Experience in adapting models for environments with limited resources.

• Documented contributions to open-source projects in multimodal AI on platforms like GitHub or HuggingFace.

• Publications in prominent AI conferences (NeurIPS, ICML, ICLR, CVPR, ECCV, etc.).

🏝️ Benefits

• Flexible working arrangements

• Professional development opportunities

AI Research Engineer – Multi-Modal, Vision

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Machine Learning Researcher – Speech/Audio

Staff AI Researcher

Senior AI Researcher

Senior Applied Researcher – 3D Reconstruction, Deep Learning

AI Research Scientist, Applied AI

Senior Generative AI Scientist II – Model Risk & Validation

Never miss a great job!