
Staff Research Scientist – Reinforcement Learning
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in California.
• Create simulation environments and digital twins tailored for enterprise workflows.
• Fine-tune LLM agents utilizing RLHF, DPO, GRPO, PPO, and innovative methodologies.
• Develop pipelines that transform human-annotated traces and verifiable signals into training datasets.
• Design multi-turn agents that utilize tools and incorporate closed learning cycles.
• Establish reward functions and verification methods that prevent reward manipulation and accurately represent actual task outcomes.
• Set the technical standards for the team — including architecture, code review, and engineering practices.
• Guide and mentor researchers and engineers; influence technical direction.
• Convert research findings into production-ready solutions; participate in publishing efforts.
• Over 7 years of experience in ML/AI research or engineering; at least 3 years at a senior or staff level.
• Master’s or PhD in Computer Science, Machine Learning, or a related field (or equivalent experience).
• Minimum of 5 years of practical experience in RL — including environment design, reward engineering, and policy optimization — with at least one production deployment of LLM Post-Training.
• At least 3 years of experience fine-tuning LLMs using hands-on RL post-training techniques (RLHF, DPO, GRPO, PPO).
• Proficient in implementing RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO at an expert level.
• Strong programming skills in Python and software engineering — capable of building production pipelines, beyond just notebooks.
• In-depth knowledge of MDPs, policy gradient methods (PPO, SAC), and temporal difference learning.
• Familiarity with modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL).
• Comprehensive health insurance coverage.
• 401(k) matching program.
• Flexible working hours.
• Paid time off.
• Options for remote work.
Jade Biosciences
SandboxAQ
SandboxAQ
iLoF - Intelligent Lab on Fiber
Get handpicked remote jobs straight to your inbox weekly.