
Member of Engineering – Reinforcement Learning
Posted Jun 3

Posted Jun 3
This is a fully remote position, open to applicants in Europe.
• Conduct research and experiments to enhance reasoning and code generation for Large Language Models (LLMs). Manage the entire experimental lifecycle from conception through experimentation to integration.
• Stay updated with the latest advancements and be knowledgeable about the cutting-edge developments in LLMs, Reinforcement Learning (RL), and code generation. Transform research concepts into clean, reusable codebases that can be utilized by other researchers.
• Design, evaluate, and refine data generation processes and the training of LLMs.
• Develop and improve RL training pipelines that are reliable across various domains.
• Identify and address training instabilities and failures, troubleshoot RL executions, and suggest mitigation strategies.
• Produce high-quality, reproducible, and maintainable code.
• Experience with Large Language Models (LLMs), which includes:
• Comprehension of the Transformer architecture and scaling laws.
• Familiarity with mid-training and post-training methodologies.
• Experience in training reasoning and/or agentic models.
• Practical experience with LLMs, understanding their capabilities and limitations.
• Background in Reinforcement Learning.
• Strong knowledge of Reinforcement Learning principles and awareness of modern algorithms.
• Experience in developing distributed, large-scale RL pipelines from data generation to evaluation.
• Research background.
• Scientific publications in areas such as Reinforcement Learning, LLMs, and reasoning models.
• Ability to engage in discussions about the latest research at a detailed level.
• Possesses well-informed opinions.
• Engineering expertise.
• Strong foundation in machine learning, algorithms, and engineering.
• Experience with distributed training.
• Proficient programming skills in Python.
• Familiarity with a deep learning framework such as Pytorch or JAX.
• Fully remote work with flexible hours.
• 37 days per year of vacation and holidays.
• Health insurance allowance for you and your dependents.
• Equipment provided by the company.
• Allowances for wellbeing, continuous learning, and home office setup.
• Regular team gatherings.
• A diverse, inclusive, and people-first culture.
SD Solutions
SIS International Research & Strategy Consulting
Roblox
Get handpicked remote jobs straight to your inbox weekly.