This is a fully remote position, open to applicants in Europe.

• Stay updated with the latest research and have a solid understanding of the current advancements in LLMs, RL, and code generation.

• Create techniques for optimizing training and inference processes to achieve high throughput.

• Design data control systems within an RL pipeline that determine what the model observes and when.

• Identify and troubleshoot instances where infrastructure choices are adversely affecting learning dynamics.

• Develop observability tools that highlight when a system-level issue is the underlying cause of a training regression.

• Contribute to the construction of robust, adaptable, and scalable RL pipelines.

• Enhance performance across the entire stack, including networking, memory, compute scheduling, and I/O.

• Produce high-quality, practical code.

• Collaborate with the team: plan future actions, engage in discussions, and maintain constant communication.

• Proven experience with LLMs and workflows following model training.

• Knowledge of Reinforcement Learning principles and awareness of its primary challenges.

• Strong foundation in software engineering (testing, code reviews, debugging complex systems).

• Proficient in Python, with expertise in concurrency, asynchronous programming, multiprocessing, and performance enhancement.

• Familiarity with deep learning frameworks (such as PyTorch or JAX) and RL workflows (rollouts, replay buffers, policy updates).

• Experience in designing and maintaining distributed RL training systems.

• Background in large-scale LLM training infrastructure.

• Proficient with profiling tools across the stack (e.g., py-spy).

• Familiarity with inference stacks (e.g., vLLM).

• Preferred: Contributions to open-source RL or distributed ML projects.

• Fully remote work with flexible hours.

• 37 days of vacation and holidays each year.

• Health insurance allowance for you and your dependents.

• Equipment provided by the company.

• Wellbeing, continuous learning, and home office allowances.

• Regular team gatherings.

• A diverse and inclusive people-first culture.

Member of Engineering – Reinforcement Learning Infrastructure

People also viewed