This is a fully remote position, open to applicants in Costa Rica.

• Design, construct, and sustain evaluation pipelines for production AI agent systems.

• Integrate multi-agent workflows with tracing and observability tools.

• Create evaluation datasets utilizing actual production traffic and interaction logs.

• Develop quality and robustness scoring systems for LLM outputs.

• Enhance the reliability of AI systems dealing with non-deterministic model behavior.

• Implement and refine HITL (Human-in-the-Loop) escalation workflows.

• Investigate production failures and drive architectural enhancements.

• Manage the complete feedback loop encompassing evaluations, prompt optimization, architecture updates, and re-testing.

• Contribute to strategies for prompt engineering and model optimization.

• Collaborate on decisions regarding multi-agent orchestration and workflow reliability.

• Work across backend systems, deployment pipelines, monitoring, and operational support.

• Engage in production support and on-call duties.

• Uphold high engineering standards concerning scalability, observability, and maintainability.

• Function autonomously across development, testing, deployment, and production ownership.

• Over 5 years of backend or AI engineering experience in production settings.

• Significant hands-on experience with production LLM or agentic AI systems.

• Proven ability to debug and maintain non-deterministic AI workflows under live conditions.

• Experience in building or managing evaluation/evals pipelines for AI systems.

• Strong comprehension of scorer design, feedback loops, and AI system evaluation methodologies.

• Excellent skills in Python backend engineering.

• Production experience with frameworks such as FastAPI, Django, Celery, LangGraph, or similar orchestration tools.

• Familiarity with observability and tracing tools including Langfuse, Grafana, Loki, OpenTelemetry, or equivalent.

• Experience in deploying and managing distributed backend systems.

• Strong understanding of AI reliability, prompt behavior, and handling model failures.

• Ability to independently manage projects from start to finish.

• Experience collaborating with asynchronous remote teams.

• Strong written communication skills in English.

• Fully Remote

• LATAM-friendly collaboration preferred

Applied AI Engineer

People also viewed