This is a fully remote position, open to applicants in California, +2 more states.

📋 Description

• Conduct research, implement, and validate modifications to model architecture and algorithms aimed at enhancing video generation fidelity, particularly focusing on human-centric quality.

• Investigate and prototype enhancements in spatial multimodal modeling, modality alignment, flow-based or diffusion-based video generation, and representations inspired by neural rendering to boost controllability and long-horizon consistency.

• Enhance training and inference efficiency through architectural innovations and post-training strategies, including compute/memory optimizations, distillation, pruning, and compression.

• Establish model training objectives that foster improvements in sim-to-real and real-to-sim generalization, particularly regarding human motion, contact, and interaction dynamics across both real-world and synthetic/simulation data.

• Create comprehensive, domain-specific benchmarks for assessing world foundation models, particularly in the generation and interpretation of world models that reason about video, simulation, and physical environments.

• Convert research findings into robust implementations such as training code, production-ready checkpoints, model integrations, and demonstrations that effectively illustrate capability enhancements across teams.

⛳️ Requirements

• A PhD in Computer Science, Graphics, Computer Engineering, or a closely related field (or equivalent experience).

• Over 8 years of applied research and/or industry experience in vision, graphics, or related ML domains or a similar area.

• More than 3 years of direct experience in designing, training, and evaluating generative models for image/video/audio, with a solid foundation in modern deep learning.

• Practical experience in enhancing generative models with an emphasis on perceptual quality and temporal stability, particularly in generating human subjects.

• Advanced skills in Python, PyTorch, C++, and CUDA, along with strong research-engineering practices (reproducibility, testing, profiling, experiment tracking).

• Experience in training and debugging large models within multi-GPU and/or multi-node environments and distributed training workflows.

• Working knowledge of inference/runtime bottlenecks and optimization strategies.

• A keen “eye for quality” and a passion for diagnosing visual artifacts (sharpness, texture detail, temporal stability, etc.) using perceptual metrics, human preference signals, or learned evaluators.

🏝️ Benefits

• Equity

• Benefits

Senior AI Researcher – World Foundation Models

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

AI Research Engineer – Multi-Modal, Vision

Lead Bioinformatics AI Scientist

AI Research Engineer – Pre-training, LLM, Multi-Modal

Machine Learning Researcher – Speech/Audio

Senior AI Researcher

Staff AI Researcher

Never miss a great job!