
AI Research Engineer – Reinforcement Learning
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in Hungary.
• Design and implement cutting-edge reinforcement learning algorithms aimed at enhancing decision-making processes in both simulated and real-world environments.
• Set explicit performance goals such as reward maximization and policy stability.
• Execute, manage, and observe controlled reinforcement learning experiments.
• Monitor key performance indicators while documenting iterative results and comparing them against predefined benchmarks.
• Identify and curate high-quality simulation environments and training datasets that are specifically tailored to address domain-specific challenges.
• Establish measurable criteria to ensure that the selection and preparation of these resources significantly improve the learning process and overall model performance.
• Systematically troubleshoot and optimize the reinforcement learning pipeline by assessing both computational efficiency and learning performance metrics.
• Tackle issues such as reward signal noise, exploration strategies, and policy divergence to enhance convergence and stability.
• Collaborate with cross-functional teams to incorporate reinforcement learning agents into production systems.
• Define clear success metrics such as enhancements in real-world performance and robustness under varying conditions, ensuring continuous monitoring and iterative updates for sustained domain adaptation.
• A degree in Computer Science or a related field.
• Preferably a PhD in NLP, Machine Learning, or a related discipline, accompanied by a strong record in AI R&D (including notable publications in A* conferences).
• Demonstrated experience with large-scale reinforcement learning experiments, particularly online RL techniques such as Group Relative Policy Optimization (GRPO), is essential.
• A profound understanding of reinforcement learning algorithms is required, including state-of-the-art online RL methods and other gradient-based optimization techniques like policy gradients, actor-critic, and GRPO.
• Strong proficiency in PyTorch and relevant reinforcement learning frameworks is mandatory.
• Practical experience in developing RL pipelines, from simulation and online training to post-training evaluation and deploying RL-based solutions in production settings is anticipated.
• Proven ability to apply empirical research to address reinforcement learning challenges such as sample inefficiency, exploration-exploitation trade-offs, and training instability.
• Skilled in designing robust evaluation frameworks and iterating on algorithmic innovations to continually advance RL agent performance.
• Work remotely from anywhere in the world.
Tether.to
Insight Timer
Tether.to
Get handpicked remote jobs straight to your inbox weekly.