
AI Evaluation Engineer – Software Engineering Domain
Posted Jun 3

Posted Jun 3
This is a fully remote position, open to applicants in Egypt.
• Create realistic terminal-based benchmark tasks for evaluating AI systems.
• Develop in-depth debugging scenarios and investigation tasks.
• Formulate task specifications that encompass infrastructure, workflows, pipelines, or operational issues.
• Articulate clear solution strategies and definitive evaluation standards.
• Identify plausible edge cases, failure modes, and system limitations.
• Craft multi-step reasoning challenges within intricate technical environments.
• Offer expertise in one or more engineering or operational fields.
• Assess and enhance the quality, difficulty, and validation logic of benchmarks.
• Partner with reviewers and researchers on workflows for AI evaluation.
• 3–10 years of experience in software engineering or similar technical areas.
• Strong skills in debugging, analysis, and systems reasoning.
• Solid understanding of system architecture, dependencies, and operational workflows.
• Familiarity with terminal, CLI, automation, or developer tooling processes.
• Experience with AI systems, large language models, benchmarking, or evaluation frameworks is advantageous.
• Capability to design technically robust and realistic engineering scenarios.
• Competitive salary and performance-based bonuses.
• Opportunities for professional development and training.
• Flexible working hours and remote work options.
• Comprehensive health and wellness benefits.
Confitec
Nagarro
HealthMark Group
Abnormal Security
Get handpicked remote jobs straight to your inbox weekly.