This is a fully remote position, open to applicants in California.

📋 Description

• Create simulation environments and digital twins tailored for enterprise workflows.

• Fine-tune LLM agents utilizing RLHF, DPO, GRPO, PPO, and innovative methodologies.

• Develop pipelines that transform human-annotated traces and verifiable signals into training datasets.

• Design multi-turn agents that utilize tools and incorporate closed learning cycles.

• Establish reward functions and verification methods that prevent reward manipulation and accurately represent actual task outcomes.

• Set the technical standards for the team — including architecture, code review, and engineering practices.

• Guide and mentor researchers and engineers; influence technical direction.

• Convert research findings into production-ready solutions; participate in publishing efforts.

⛳️ Requirements

• Over 7 years of experience in ML/AI research or engineering; at least 3 years at a senior or staff level.

• Master’s or PhD in Computer Science, Machine Learning, or a related field (or equivalent experience).

• Minimum of 5 years of practical experience in RL — including environment design, reward engineering, and policy optimization — with at least one production deployment of LLM Post-Training.

• At least 3 years of experience fine-tuning LLMs using hands-on RL post-training techniques (RLHF, DPO, GRPO, PPO).

• Proficient in implementing RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO at an expert level.

• Strong programming skills in Python and software engineering — capable of building production pipelines, beyond just notebooks.

• In-depth knowledge of MDPs, policy gradient methods (PPO, SAC), and temporal difference learning.

• Familiarity with modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL).

🏝️ Benefits

• Comprehensive health insurance coverage.

• 401(k) matching program.

• Flexible working hours.

• Paid time off.

• Options for remote work.

Staff Research Scientist – Reinforcement Learning

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Principal Scientist, Immunology

Senior Research Scientist, Battery Materials Simulation

Research Scientist, Battery Materials Simulation

Senior Scientist – Mathematical & Physics Signals Modeling, Experimental Interface

Chargé.e de Recherche et d'Analyse, Stagiaire fin d'études

Senior UX Researcher – Short-Term Contract, 1099

Never miss a great job!