This is a fully remote position, open to applicants in Europe.

📋 Description

• Conduct research and experiments to enhance reasoning and code generation for Large Language Models (LLMs). Manage the entire experimental lifecycle from conception through experimentation to integration.

• Stay updated with the latest advancements and be knowledgeable about the cutting-edge developments in LLMs, Reinforcement Learning (RL), and code generation. Transform research concepts into clean, reusable codebases that can be utilized by other researchers.

• Design, evaluate, and refine data generation processes and the training of LLMs.

• Develop and improve RL training pipelines that are reliable across various domains.

• Identify and address training instabilities and failures, troubleshoot RL executions, and suggest mitigation strategies.

• Produce high-quality, reproducible, and maintainable code.

⛳️ Requirements

• Experience with Large Language Models (LLMs), which includes:

• Comprehension of the Transformer architecture and scaling laws.

• Familiarity with mid-training and post-training methodologies.

• Experience in training reasoning and/or agentic models.

• Practical experience with LLMs, understanding their capabilities and limitations.

• Background in Reinforcement Learning.

• Strong knowledge of Reinforcement Learning principles and awareness of modern algorithms.

• Experience in developing distributed, large-scale RL pipelines from data generation to evaluation.

• Research background.

• Scientific publications in areas such as Reinforcement Learning, LLMs, and reasoning models.

• Ability to engage in discussions about the latest research at a detailed level.

• Possesses well-informed opinions.

• Engineering expertise.

• Strong foundation in machine learning, algorithms, and engineering.

• Experience with distributed training.

• Proficient programming skills in Python.

• Familiarity with a deep learning framework such as Pytorch or JAX.

🏝️ Benefits

• Fully remote work with flexible hours.

• 37 days per year of vacation and holidays.

• Health insurance allowance for you and your dependents.

• Equipment provided by the company.

• Allowances for wellbeing, continuous learning, and home office setup.

• Regular team gatherings.

• A diverse, inclusive, and people-first culture.

Member of Engineering – Reinforcement Learning

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Bare Developer

Mechanical Designer – Ventilation & Engineering

Survey Programmer – Ops, Scripting

Developer Engagement Representative – Part-Time Contract

Associate Curriculum Developer, Regional Training Lead – JAPAC

Frontend Developer – Flutter (Mid-level)

Never miss a great job!