This is a fully remote position, open to applicants in Brazil.

• Develop and execute evaluation and testing frameworks for AI systems including LLMs, RAG, and agents;

• Establish quality metrics, benchmarks, and validation datasets for operational models;

• Streamline QA pipelines that are integrated into the development lifecycle (CI/CD);

• Detect and document failures, hallucinations, and behavioral regressions in models;

• Partner with engineering and product teams to outline acceptance criteria for AI functionalities;

• Oversee the quality of production systems and recommend ongoing enhancements.

• Proficiency in Python and familiarity with libraries in the AI/ML ecosystem (LangChain, LlamaIndex, Hugging Face, or comparable);

• Hands-on experience in testing and assessing LLM-based systems;

• Foundational knowledge in software engineering (version control, unit testing, REST APIs);

• Understanding of prompting concepts, RAG, and agent orchestration;

• Experience with AWS services (Lambda, S3, SageMaker, Bedrock);

• Insight into observability and model monitoring tools;

• Familiarity with evaluation frameworks such as Ragas, DeepEval, or similar.

• Competitive salary and performance-based bonuses;

• Flexible work hours and remote work options;

• Comprehensive health benefits including medical, dental, and vision;

• Opportunities for professional development and continuous learning;

• Collaborative and inclusive company culture.

Mid-level AI Engineer – AI Systems QA

People also viewed