
AI QA Engineer – Calidad y Evaluación de IA Generativa
Posted 1 day ago

Posted 1 day ago
• Design, validate, and enhance evaluation frameworks for AI agents.
• Implement automated and regression testing suites for generative models.
• Define and monitor quality metrics related to: Relevance, Fidelity, Consistency, Accuracy, and Hallucinations.
• Build evaluation systems like “LLM-as-a-Judge.”
• Establish performance benchmarks for new models and existing agents.
• Validate updates for prompts, models, and RAG pipelines.
• Collaborate with AI and development teams to define acceptance criteria (pass/fail).
• Analyze evaluation results and propose continuous improvements.
• Generate metric reports and traceability regarding agent quality.
• Minimum of 3 years of experience in QA automation, Data/AI Quality, or evaluation of AI systems.
• Advanced experience in Python.
• Experience working with AI evaluation frameworks such as: RAGAS, DeepEval, Vertex Gen AI Evaluation Service.
• Experience in evaluating RAG systems and LLM models.
• Ability to design “LLM-as-a-Judge” systems.
• Experience in test automation and validations.
• Knowledge in: Prompt evaluation, Response quality, Model benchmarking, and Testing of generative AI.
• Familiarity with metrics such as: Groundedness, Faithfulness, Context relevance, and Answer relevance.
• Experience working with non-deterministic systems.
• Desirable: Experience in conversational AI platforms.
• Knowledge of RAG pipelines.
• Experience with generative model APIs.
• Proficiency in observability and monitoring tools.
• Knowledge in MLOps or LLMOps.
• Experience in cloud environments (GCP, AWS, or Azure).
• Work mode: 100% Remote
• Excellent work environment
• Opportunities for growth and participation in innovative projects.
Zealogics Inc
Compass
GSB Solutions
B2Spin Limited
Get handpicked remote jobs straight to your inbox weekly.