
Machine Learning Researcher – RL and Agentic Systems
Posted Jun 3

Posted Jun 3
This is a fully remote position, open to applicants in Brazil.
• Create and develop datasets, tasks, and environments for benchmarking agentic systems and multi-step model behavior.
• Convert real-world workflows into structured tasks, interaction traces, trajectories, stateful environments, and verifiable outcomes suitable for assessing advanced AI systems.
• Establish frameworks that evaluate diversity, realism, coverage, fidelity, informativeness, and downstream usefulness of datasets for agentic systems.
• Design quality scorecards and evaluation techniques that clarify dataset strengths, weaknesses, and failure modes across teams.
• Assess planning, tool utilization, robustness, recovery from failure, task completion, and generalization behavior in RL-style or agentic environments.
• Trace model failures back to specific dataset, environment, or task-design deficiencies and propose enhancements based on empirical evidence.
• Participate in the development of tools and systems that automate dataset validation, environment generation, rollout analysis, benchmark creation, and evaluation workflows.
• Enhance internal infrastructure to support reproducible experimentation, benchmark management, and quality of evaluation.
• Work closely with research and engineering teams to identify data bottlenecks, refine evaluation methodologies, and establish internal best practices for task-grounded AI training data.
• Advocate for DataLab’s viewpoint in cross-functional discussions regarding dataset quality, benchmark design, and evaluation of frontier agentic systems.
• PhD or equivalent Master’s Degree with 4+ years of industry experience in machine learning, computer science, statistics, engineering, mathematics, economics, or related quantitative fields.
• Comprehensive understanding of AI model training pipelines, evaluation methodologies, and the significance of data in influencing model performance.
• Proven experience with large, unstructured, or semi-structured datasets utilized for training or evaluating ML systems.
• Background in reinforcement learning, sequential decision-making, agentic systems, tool-using models, or multi-step model evaluation.
• Experience in designing tasks, benchmarks, environments, simulations, or evaluation frameworks for real-world model behavior.
• Strong intuition regarding realism, coverage, difficulty, fidelity, and the meaningful structuring of outcomes in datasets.
• Solid skills in experimental design, evaluation, benchmarking, and data validation.
• High degree of ownership and the capability to independently identify and address high-impact problems.
• Health insurance
• 401(k) matching
• Paid time off
• Remote work options
Tether.to
Insight Timer
Tether.to
Get handpicked remote jobs straight to your inbox weekly.