
AI Evaluation Engineer – Data Analysis, Multi-Agent Systems
Posted 17 hours ago

Posted 17 hours ago
• Design and create **benchmark tasks for multi-agent systems** that focus on intricate data analysis workflows.
• Develop or compile **realistic datasets** (including CSV, JSON, logs, reports, financial, or operational data).
• Construct tasks that necessitate:
• - Cross-referencing multiple data sources.
• - Anomaly detection and identification of contradictions.
• - Statistical analysis and interpretation.
• Establish **task decomposition strategies** among specialized sub-agents (such as financial, technical, and operational analysis).
• Create **verification logic** to ensure accurate analytical outputs (rather than generic summaries).
• Implement evaluation pipelines utilizing **Python and SQL**.
• Develop reproducible environments using **Docker**.
• Assess task performance and enhance for **clarity, difficulty, and scoring precision**.
• Over 5 years of experience in **data analysis or analytics-focused roles**.
• Strong expertise in **Python (pandas, NumPy)** and **SQL**.
• Experience in handling **real-world, complex datasets** (such as CSV, JSON, logs, reports).
• Capability to design **analytical challenges with clear and verifiable outcomes**.
• Comprehensive knowledge of **statistics** (including distributions, correlations, and outliers).
• Familiarity with **AI benchmarks or evaluation environments** (for instance, SWE-bench or similar).
• Practical experience with **Docker** (including Dockerfiles, image builds, and debugging).
• **Nice to Have**
• Experience in **financial analysis, operations analytics, or risk analysis**.
• Exposure to **data pipelines or ETL workflows**.
• Experience with **data quality validation or anomaly detection systems**.
• Familiarity with **AI/ML data workflows or evaluation frameworks**.
• Competitive salary and performance bonuses.
• Flexible working hours and remote work options.
• Opportunities for professional development and continuing education.
• Collaborative and innovative work environment.
• Health and wellness benefits.
Advarra
RHF Talentos
BPCS, Comprehensive marketing solutions, ltd.
BD
Get handpicked remote jobs straight to your inbox weekly.