
Research Lead – Principal Scientist, Manager Post-Training, Alignment, Reinforcement Learning
Posted Jun 21

Posted Jun 21
This is a fully remote position, open to applicants in Canada.
• Take ownership of the post-training strategy for model development, encompassing RLHF, preference optimization, agentic systems, and long-horizon reasoning.
• Create innovative algorithms that enhance model reliability, controllability, and alignment.
• Make informed architectural decisions regarding the timing of addressing challenges at the pre-training, post-training, or system level.
• Design and conduct experiments to influence model behavior, robustness, and reasoning quality.
• Collaborate with infrastructure teams to develop scalable and reproducible post-training workflows.
• Contribute to publications, patents, and enhance Autodesk's visibility in external research.
• Create evaluation frameworks focused on long-horizon reasoning, tool utilization, agentic behavior, safety, and completion of real-world workflows.
• Lead thorough model analysis and interpretability initiatives.
• Facilitate human-in-the-loop evaluations with high-quality annotations and rigorous scientific methodology.
• Define model readiness criteria and offer go/no-go recommendations for releases.
• Oversee, mentor, and develop a team of AI scientists.
• Establish technical direction and research priorities for post-training and alignment projects.
• Cultivate a research culture rooted in scientific rigor, reproducibility, and rapid iteration.
• Assist in recruiting top-tier talent in ML, RL, alignment, and foundational models.
• Work closely with pre-training teams, infrastructure, product organizations, and other stakeholders.
• Convert research trade-offs into clear, actionable guidance for leadership.
• Extensive hands-on experience in reinforcement learning for foundational models, alongside proficiency in post-training methodologies (RLHF, RLAIF, DPO, PPO, or similar techniques).
• Demonstrated experience in leading or mentoring technical research teams, whether in an academic lab, AI research organization, or industry environment.
• Strong intuition regarding model behavior, alignment challenges, and post-training trade-offs.
• Experience in designing evaluation systems and rigorously considering what it entails for a model to be deemed ready.
• Capability to clearly articulate complex technical trade-offs to both technical and non-technical audiences.
• A PhD or equivalent depth of industry research experience in ML, RL, AI, or a related discipline.
• Health insurance
• Retirement plans
• Paid time off
• Flexible work arrangements
• Professional development
• Bonuses
• Stock options
• Equipment allowances
• Wellness programs
LexisNexis
Futures
Hunt St
CRC Insurance Services
Get handpicked remote jobs straight to your inbox weekly.