
Senior AI Test Engineer
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in India.
• Convert the AI testing strategy into actionable test scenarios that encompass LLM outputs, document classification, extraction accuracy, agent workflows, and edge cases.
• Create adversarial and boundary test inputs to reveal hallucination, misclassification, and failure modes.
• Assess AI outputs for structure, consistency, accuracy, and readiness for production against established performance thresholds.
• Develop reusable Python-based evaluation frameworks, incorporating output validation, hallucination detection, and scoring systems.
• Create parameterized test scripts that can be reused across various features, models, and releases.
• Implement AI-as-Judge frameworks, which include prompt design, scoring logic, and calibration to ensure evaluation reliability.
• Integrate evaluation frameworks into CI/CD pipelines to facilitate continuous testing and deployment.
• Design and manage drift detection frameworks utilizing fixed baseline datasets and scheduled re-evaluations.
• Set thresholds to differentiate acceptable variations from performance degradation.
• Enable release gating by identifying regressions before production deployment.
• Construct and maintain ground truth datasets in collaboration with subject matter experts.
• Establish standards for classification, extraction accuracy, and acceptable output characteristics.
• Continuously update datasets to align with changing business requirements and use cases.
• Test comprehensive agentic workflows, ensuring data integrity, error propagation, and fallback behavior are validated.
• Conduct API-level testing of AI pipeline endpoints using Python and Postman/Newman.
• Validate data persistence and integrity across system layers through SQL.
• Collaborate with engineering teams to guarantee testability, observability, and system reliability.
• Define and expand standardized AI evaluation patterns and reusable quality frameworks throughout VLabs.
• Contribute to enterprise-level AI quality standards and reference architectures.
• Ensure compliance with Responsible AI, data privacy, and governance requirements.
• Support the auditability, traceability, and transparency of AI outputs and evaluation processes.
• Convert evaluation findings into actionable insights for engineering, product, and business stakeholders.
• Assist in decision-making regarding model readiness, release risks, and performance trade-offs.
• Proactively identify risks, patterns, and systemic issues and escalate them as necessary.
• Over 7 years of experience in software testing, with at least 2–3 years dedicated to AI/ML-enabled systems in production settings.
• Demonstrated experience in designing and implementing AI evaluation frameworks and quality strategies.
• Proven success in building ground truth datasets, drift detection systems, and scalable evaluation pipelines.
• Experience in testing multi-step agentic workflows and AI-driven automation systems.
• Familiarity with fast-paced, iterative delivery environments.
• Background in regulated or compliance-focused environments is preferred.
• Advanced proficiency in Python programming for evaluation frameworks, batch processing, and data analysis.
• Experience with LLM evaluation tools such as deepeval, RAGAS, promptfoo, or similar tools.
• Strong skills in AI output validation, hallucination detection, and grounding checks.
• Knowledge of drift detection frameworks and statistical evaluation methods.
• Expertise in OCR, VLM, and document AI testing, including classification, extraction, and edge cases.
• Proficient in API testing using Python (requests/httpx) and Postman/Newman.
• SQL experience for data validation and ensuring pipeline integrity.
• Familiarity with LangChain, LlamaIndex, or similar frameworks.
• Experience with cloud AI platforms such as Azure AI Foundry or AWS Bedrock is preferred.
• Health insurance
• Retirement plans
• Flexible work arrangements
• Professional development opportunities
Uvation
Zartis
Bitrefill
Miratech
Get handpicked remote jobs straight to your inbox weekly.