This is a fully remote position, open to applicants in Brazil.

📋 Description

• Create and develop datasets, tasks, and environments for benchmarking agentic systems and multi-step model behavior.

• Convert real-world workflows into structured tasks, interaction traces, trajectories, stateful environments, and verifiable outcomes suitable for assessing advanced AI systems.

• Establish frameworks that evaluate diversity, realism, coverage, fidelity, informativeness, and downstream usefulness of datasets for agentic systems.

• Design quality scorecards and evaluation techniques that clarify dataset strengths, weaknesses, and failure modes across teams.

• Assess planning, tool utilization, robustness, recovery from failure, task completion, and generalization behavior in RL-style or agentic environments.

• Trace model failures back to specific dataset, environment, or task-design deficiencies and propose enhancements based on empirical evidence.

• Participate in the development of tools and systems that automate dataset validation, environment generation, rollout analysis, benchmark creation, and evaluation workflows.

• Enhance internal infrastructure to support reproducible experimentation, benchmark management, and quality of evaluation.

• Work closely with research and engineering teams to identify data bottlenecks, refine evaluation methodologies, and establish internal best practices for task-grounded AI training data.

• Advocate for DataLab’s viewpoint in cross-functional discussions regarding dataset quality, benchmark design, and evaluation of frontier agentic systems.

⛳️ Requirements

• PhD or equivalent Master’s Degree with 4+ years of industry experience in machine learning, computer science, statistics, engineering, mathematics, economics, or related quantitative fields.

• Comprehensive understanding of AI model training pipelines, evaluation methodologies, and the significance of data in influencing model performance.

• Proven experience with large, unstructured, or semi-structured datasets utilized for training or evaluating ML systems.

• Background in reinforcement learning, sequential decision-making, agentic systems, tool-using models, or multi-step model evaluation.

• Experience in designing tasks, benchmarks, environments, simulations, or evaluation frameworks for real-world model behavior.

• Strong intuition regarding realism, coverage, difficulty, fidelity, and the meaningful structuring of outcomes in datasets.

• Solid skills in experimental design, evaluation, benchmarking, and data validation.

• High degree of ownership and the capability to independently identify and address high-impact problems.

🏝️ Benefits

• Health insurance

• 401(k) matching

• Paid time off

• Remote work options

Machine Learning Researcher – RL and Agentic Systems

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

AI Research Engineer, Model Compression – Quantization

Clinical AI Research Lead

AI Research Engineer – Pre-training, LLM, Multi-Modal

Clinical AI Research Assistant

ML Researcher

AI Researcher

Never miss a great job!