Remotery

AI Evaluation Engineer, Knowledge and Research

Posted May 21

This is a fully remote position, open to applicants in Colombia.

📋 Description

• Develop benchmark tasks for multi-agent systems that necessitate the reading, analysis, and synthesis of extensive document collections.

• Assemble real-world research data sets — including academic papers, case studies, and technical reports — and formulate questions that demand thorough analysis.

• Compose structured ground-truth oracles (JSON) containing specific, verifiable answers that validate the agent's engagement with the source material.

• Create prompts for LLM judges to assess agent outputs on a field-by-field basis against the oracle.

• Establish decomposition guides that distribute research tasks among multiple parallel sub-agents (one for each document, one for each domain, followed by synthesis).


⛳️ Requirements

• Over 5 years of experience in **research (academic or industry)** within a scientific, technical, or analytical field.

• Strong capability to **read, analyze, and extract structured information from unstructured documents**.

• Experience in designing or handling **structured data formats (JSON, schemas, validation)**.

• Proficient in **Python scripting** (for data processing, validation, or evaluation scripts).

• Background in **AI evaluation, coding benchmarks, or structured reasoning tasks** (such as SWE-bench, Terminal-bench, or similar).

• Familiarity with **Docker** (including image building and container debugging).

• Exceptional attention to detail, particularly when specifying **exact, verifiable outputs**.

• Capability to design **complex, multi-step problem-solving workflows**.


🏝️ Benefits

• Opportunity to work on cutting-edge projects in a collaborative environment.

• Access to continuous learning and professional development resources.

• Flexible work hours and potential remote work options.

• Competitive salary and comprehensive benefits package.

People also viewed

EverAI10 hours ago

Senior AI Vertical Mini-Series Director

BA flagBosnia and Herzegovina OnlyFull-timeArtificial Intelligence
ApplyView job
10x.Team10 hours ago

Risk Analyst – AI Trainer, Freelance

FR flagFrance OnlyFreelanceArtificial Intelligence€83 – €150/hour
ApplyView job
EverAI10 hours ago

Senior AI Vertical Mini-Series Director – Freelance

Anywhere in the WorldFull-timeArtificial Intelligence
ApplyView job
Invisible Technologies1 day ago

Language Alignment & Resource Partner – Haitian Creole, Freelance AI Trainer

Anywhere in the WorldFreelanceArtificial Intelligence$6 – $65/hour
ApplyView job
Lemontaps®1 day ago

Automation & AI Manager

Anywhere in the WorldFull-timeArtificial Intelligence
ApplyView job
Anyone AI1 day ago

Mathematics AI Training Expert

IT flagItaly OnlyFreelanceArtificial Intelligence$40/hour
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers