
AI Evaluation Engineer – Mathematics & Algorithms
Posted May 25

Posted May 25
This is a fully remote position, open to applicants in Pakistan.
• Develop and construct **benchmark tasks for multi-agent systems** that necessitate multi-step mathematical reasoning and algorithmic problem-solving.
• Generate **complex and decomposable problems** across various fields, including:
• - Competitive mathematics
• - Numerical analysis
• - Combinatorial optimization
• - Statistical inference
• Create **verification scripts** to ensure the accuracy of:
• - Numerical results (within specified tolerance levels)
• - Proof correctness and logical progression
• - Algorithmic outputs and their constraints
• Compose **clear and organized problem statements** utilizing precise notation and defined outcomes.
• Formulate **task decomposition strategies** suitable for parallel or multi-agent execution.
• Execute computational solutions and validation processes using Python.
• Engage with containerized environments (Docker) to ensure reproducibility and evaluation.
• Over 5 years of experience in mathematics, quantitative research, or computational science — with a background in competition mathematics, university-level mathematics, or quantitative research.
• Proficient in Python programming — including libraries such as NumPy, SciPy, or symbolic computation (SymPy). Experience in writing mathematical proofs or formal derivations is essential.
• Capability to design problems that yield precise, verifiable answers — avoiding subjective or open-ended questions.
• Familiarity with AI coding benchmarks (SWE-bench, Terminal-bench).
• Proficient with Docker — including writing Dockerfiles, building images, and troubleshooting container-related issues.
• Strong understanding of numerical methods — including floating point tolerance, convergence criteria, and error bounds.
• **Nice to Have**
• Experience in crafting competition math problems (AMC, AIME, Putnam, IMO).
• A background in **theoretical computer science or advanced mathematics research**.
• Exposure to **automated theorem proving or formal verification** techniques.
• Familiarity with AI reasoning benchmarks (GSM8K, MATH, AIME, GPQA, ARC-AGI).
• Experience in **large-scale numerical or scientific computing**.
• Opportunity to work on cutting-edge mathematical and computational challenges.
• Collaborative environment with a focus on innovation and problem-solving.
• Continuous learning and development opportunities in advanced mathematics and algorithm design.
• Flexibility in work arrangements, including options for remote work.
EverAI
10x.Team
EverAI
Invisible Technologies
Get handpicked remote jobs straight to your inbox weekly.