Remotery

Staff Research Scientist – Reinforcement Learning

Posted 1 day ago

This is a fully remote position, open to applicants in California.

📋 Description

• Create simulation environments and digital twins tailored for enterprise workflows.

• Fine-tune LLM agents utilizing RLHF, DPO, GRPO, PPO, and innovative methodologies.

• Develop pipelines that transform human-annotated traces and verifiable signals into training datasets.

• Design multi-turn agents that utilize tools and incorporate closed learning cycles.

• Establish reward functions and verification methods that prevent reward manipulation and accurately represent actual task outcomes.

• Set the technical standards for the team — including architecture, code review, and engineering practices.

• Guide and mentor researchers and engineers; influence technical direction.

• Convert research findings into production-ready solutions; participate in publishing efforts.


⛳️ Requirements

• Over 7 years of experience in ML/AI research or engineering; at least 3 years at a senior or staff level.

• Master’s or PhD in Computer Science, Machine Learning, or a related field (or equivalent experience).

• Minimum of 5 years of practical experience in RL — including environment design, reward engineering, and policy optimization — with at least one production deployment of LLM Post-Training.

• At least 3 years of experience fine-tuning LLMs using hands-on RL post-training techniques (RLHF, DPO, GRPO, PPO).

• Proficient in implementing RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO at an expert level.

• Strong programming skills in Python and software engineering — capable of building production pipelines, beyond just notebooks.

• In-depth knowledge of MDPs, policy gradient methods (PPO, SAC), and temporal difference learning.

• Familiarity with modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL).


🏝️ Benefits

• Comprehensive health insurance coverage.

• 401(k) matching program.

• Flexible working hours.

• Paid time off.

• Options for remote work.

People also viewed

Jade BiosciencesJun 27

Principal Scientist, Immunology

US flagCalifornia, +1 more stateFull-timeResearch Scientist$175k – $190k/year
ApplyView job
SandboxAQJun 27

Senior Research Scientist, Battery Materials Simulation

US flagUnited States OnlyFull-timeResearch Scientist$134.4k – $252k/year
ApplyView job
SandboxAQJun 26

Research Scientist, Battery Materials Simulation

US flagUnited States OnlyFull-timeResearch Scientist$112k – $210k/year
ApplyView job
iLoF - Intelligent Lab on FiberJun 26

Senior Scientist – Mathematical & Physics Signals Modeling, Experimental Interface

PT flagPortugal OnlyFull-timeResearch Scientist
ApplyView job
Roland BergerJun 26

Chargé.e de Recherche et d'Analyse, Stagiaire fin d'études

FR flagFrance OnlyFull-timeResearch Scientist
ApplyView job
Friends From The CityJun 25

Senior UX Researcher – Short-Term Contract, 1099

US flagNew York OnlyFreelanceResearch Scientist$90/hour
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers