Remotery

Senior Software Engineer – AI Evaluation & Benchmarks, Python

atG2i Inc.US flagFloridaFreelanceBackend EngineerSenior$80 – $100/hour

Posted 22 hours ago

📋 Description

• Create and develop coding benchmarks and evaluation pipelines utilized to assess frontier AI models in real software engineering tasks:

• Design coding benchmarks that test frontier models on practical programming challenges — including reasoning, debugging, and production-level code.

• Develop and sustain scalable data pipelines for evaluation processes.

• Evaluate model-generated code for accuracy, reliability, and edge-case failures.

• Create structured evaluation scenarios across extensive repositories and multi-language environments.

• Offer comprehensive technical feedback on model performance and failure patterns.

• Contribute to evaluation frameworks that establish standards for measuring coding capabilities.

• The ultimate objective: benchmarks that effectively differentiate what frontier models can and cannot achieve — influencing how the next generation is trained and refined.

• AI coding evaluation summarized: Design task → build harness → execute model → analyze failures → integrate findings back into the benchmark → evaluations that truly differentiate robust models from weaker ones.


⛳️ Requirements

• A minimum of 4 years of professional software engineering experience (mandatory).

• Proficient in Python — producing clean, efficient, and thoroughly tested code.

• Practical experience with large, complex codebases.

• Demonstrated experience in designing and implementing LLM coding benchmarks and evaluation data pipelines.

• Strong proficiency in Git and modern development workflows.

• Proven track record at a high-growth tech company or a prestigious software organization.

• Excellent written communication skills in English.

• Identity verification: Applicants are required to verify their identity and possess valid documentation to work as an independent contractor.


🏝️ Benefits

• Identity verification is required for independent contractors residing in their home country.

• Weekly payments via PayPal or Stripe.

People also viewed

Rox Partner18 hours ago

AI Ops Engineer – Backend Developer, Python

BR flagBrazil OnlyFull-timeBackend Engineer
ApplyView job
Very18 hours ago

Senior Software Engineer – Python

CO flagColombia OnlyFreelanceBackend Engineer$40 – $60/hour
ApplyView job
Reply18 hours ago

Java Developer

BR flagBrazil OnlyFull-timeBackend Engineer
ApplyView job
Reply18 hours ago

Senior Java Developer

IT flagItaly OnlyFull-timeBackend Engineer
ApplyView job
Beyond Soluções18 hours ago

Mid-level .NET Developer

BR flagBrazil OnlyFull-timeBackend Engineer
ApplyView job
UserTesting18 hours ago

Backend Engineer – Internal Tools

GB flagUnited Kingdom OnlyFull-timeBackend Engineer
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers