
Research Lead – Principal Scientist, Manager Post-Training, Alignment, Reinforcement Learning
Posted 1 day ago

Posted 1 day ago
• Take charge of the post-training strategy for model development — encompassing RLHF, preference optimization, agentic systems, and long-horizon reasoning.
• Create innovative algorithms that enhance model reliability, controllability, and alignment.
• Make informed architectural choices regarding when to tackle challenges at the pre-training, post-training, or system level.
• Design and conduct experiments that influence model behavior, robustness, and reasoning quality.
• Collaborate with infrastructure teams to develop scalable and reproducible post-training workflows.
• Contribute to publications, patents, and increase Autodesk's visibility in external research.
• Develop evaluation frameworks for long-horizon reasoning, tool usage, agentic behavior, safety, and real-world workflow completion.
• Lead comprehensive model analysis and interpretability initiatives.
• Oversee human-in-the-loop evaluations with high annotation quality and robust scientific methodology.
• Define model readiness criteria and offer go/no-go recommendations for releases.
• Manage, mentor, and cultivate a team of AI scientists.
• Establish technical direction and research priorities for post-training and alignment projects.
• Promote a research culture that emphasizes scientific rigor, reproducibility, and rapid iteration.
• Assist in recruiting top-tier talent in ML, RL, alignment, and foundation models.
• Collaborate closely with pre-training teams, infrastructure, product organizations, and other stakeholders.
• Convert research trade-offs into clear, actionable guidance for leadership.
• Extensive hands-on experience in reinforcement learning for foundation models, along with proficiency in post-training methods (RLHF, RLAIF, DPO, PPO, or similar approaches).
• Demonstrated experience in leading or mentoring technical research teams — whether in an academic lab, AI research institution, or industry environment.
• Strong instinct for model behavior, alignment challenges, and post-training trade-offs.
• Experience in designing evaluation systems and rigorously considering what constitutes model readiness.
• Ability to effectively communicate complex technical trade-offs to both technical and non-technical audiences.
• A PhD or equivalent level of industry research experience in ML, RL, AI, or a related discipline.
• Health insurance
• Retirement plans
• Paid time off
• Flexible work arrangements
• Professional development
• Bonuses
• Stock options
• Equipment allowances
• Wellness programs
Sun Gro Horticulture
Same
Aledade, Inc.
Montrose Environmental Group
Get handpicked remote jobs straight to your inbox weekly.