Remotery

Senior Software Engineer – AI Evaluation & Benchmarks, Python

atG2i Inc.US flagFloridaFreelanceBackend EngineerSenior$80 – $100/hour

Posted May 14

This is a fully remote position, open to applicants in Florida.

📋 Description

• Create and develop coding benchmarks and evaluation pipelines utilized to assess frontier AI models in real software engineering tasks:

• Design coding benchmarks that test frontier models on practical programming challenges — including reasoning, debugging, and production-level code.

• Develop and sustain scalable data pipelines for evaluation processes.

• Evaluate model-generated code for accuracy, reliability, and edge-case failures.

• Create structured evaluation scenarios across extensive repositories and multi-language environments.

• Offer comprehensive technical feedback on model performance and failure patterns.

• Contribute to evaluation frameworks that establish standards for measuring coding capabilities.

• The ultimate objective: benchmarks that effectively differentiate what frontier models can and cannot achieve — influencing how the next generation is trained and refined.

• AI coding evaluation summarized: Design task → build harness → execute model → analyze failures → integrate findings back into the benchmark → evaluations that truly differentiate robust models from weaker ones.


⛳️ Requirements

• A minimum of 4 years of professional software engineering experience (mandatory).

• Proficient in Python — producing clean, efficient, and thoroughly tested code.

• Practical experience with large, complex codebases.

• Demonstrated experience in designing and implementing LLM coding benchmarks and evaluation data pipelines.

• Strong proficiency in Git and modern development workflows.

• Proven track record at a high-growth tech company or a prestigious software organization.

• Excellent written communication skills in English.

• Identity verification: Applicants are required to verify their identity and possess valid documentation to work as an independent contractor.


🏝️ Benefits

• Identity verification is required for independent contractors residing in their home country.

• Weekly payments via PayPal or Stripe.

People also viewed

Clearwaters.IT9 hours ago

Senior .NET Developer

US flagWashington OnlyFull-timeBackend Engineer
ApplyView job
ALTEN9 hours ago

Python Developer

DE flagGermany OnlyFull-timeBackend Engineer
ApplyView job
SBM TECHNOLOGY9 hours ago

Fullstack Developer (Java/Angular) - Mid-level

BR flagBrazil OnlyFull-timeBackend Engineer
ApplyView job
Phoenix Cyber9 hours ago

Python Developer

US flagCalifornia OnlyFull-timeBackend Engineer
ApplyView job
Obsidian Systems LLC10 hours ago

Software Engineer – Scala

CA flagCanada OnlyFull-timeBackend Engineer
ApplyView job
Samster, Konkel & Safran Law Offices10 hours ago

Senior Backend Engineer – Features Team

CA flagCanada OnlyFull-timeBackend Engineer$105k – $125k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers