
Staff Test Engineer – AI
Posted May 25

Posted May 25
This is a fully remote position, open to applicants in India.
• Define and oversee the comprehensive testing strategy for Outreach’s GenAI platform, which encompasses agentic workflows, LLM tool calls, LangGraph orchestration, and associated ML pipelines.
• Design and implement evaluation systems capable of handling both deterministic and non-deterministic outputs.
• Take ownership of testing across Outreach’s collection of AI agents.
• Collaborate closely with Data Science, MLOps, and platform engineers to ensure that testability is integrated from the outset.
• Incorporate evaluation pipelines into CI/CD workflows.
• Establish and monitor key metrics relevant to AI systems, including answer quality scores, tool invocation accuracy, hallucination rates, latency, and regression trends associated with model and prompt changes.
• Set standards for AI testing throughout the organization—covering prompt regression testing, retrieval quality evaluation, and agent behavior contracts.
• Elevate quality across engineering teams by mentoring engineers.
• Proactively monitor advancements in AI evaluation tools, LLM benchmarking, and testing research.
• 7–12 years of experience in software development and/or test automation, with a proven track record of leading quality initiatives on complex, distributed systems.
• B.S. in Computer Science or a related technical discipline.
• Strong programming capabilities in Python, with experience in developing reusable and maintainable test frameworks.
• Demonstrated experience in testing large-scale backend or platform systems, including microservices and API layers.
• In-depth understanding of test design principles, CI/CD integration, and scalable test automation.
• Familiarity with test frameworks such as PyTest or their equivalents.
• Comprehensive knowledge of evaluation methodologies for non-deterministic systems, including statistical assertions, behavioral testing, and regression baselines.
• Practical experience with Databricks for constructing and validating ML pipelines and data workflows.
• Experience with MLflow for experiment tracking, model versioning, and pipeline observability.
• Excellent communication and collaboration skills across engineering, data science, and product teams.
• We’re an equal opportunity employer. All applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity
Uvation
Zartis
Bitrefill
Miratech
Get handpicked remote jobs straight to your inbox weekly.