This is a fully remote position, open to applicants in India.

📋 Description

• Convert the AI testing strategy into actionable test scenarios that encompass LLM outputs, document classification, extraction accuracy, agent workflows, and edge cases.

• Create adversarial and boundary test inputs to reveal hallucination, misclassification, and failure modes.

• Assess AI outputs for structure, consistency, accuracy, and readiness for production against established performance thresholds.

• Develop reusable Python-based evaluation frameworks, incorporating output validation, hallucination detection, and scoring systems.

• Create parameterized test scripts that can be reused across various features, models, and releases.

• Implement AI-as-Judge frameworks, which include prompt design, scoring logic, and calibration to ensure evaluation reliability.

• Integrate evaluation frameworks into CI/CD pipelines to facilitate continuous testing and deployment.

• Design and manage drift detection frameworks utilizing fixed baseline datasets and scheduled re-evaluations.

• Set thresholds to differentiate acceptable variations from performance degradation.

• Enable release gating by identifying regressions before production deployment.

• Construct and maintain ground truth datasets in collaboration with subject matter experts.

• Establish standards for classification, extraction accuracy, and acceptable output characteristics.

• Continuously update datasets to align with changing business requirements and use cases.

• Test comprehensive agentic workflows, ensuring data integrity, error propagation, and fallback behavior are validated.

• Conduct API-level testing of AI pipeline endpoints using Python and Postman/Newman.

• Validate data persistence and integrity across system layers through SQL.

• Collaborate with engineering teams to guarantee testability, observability, and system reliability.

• Define and expand standardized AI evaluation patterns and reusable quality frameworks throughout VLabs.

• Contribute to enterprise-level AI quality standards and reference architectures.

• Ensure compliance with Responsible AI, data privacy, and governance requirements.

• Support the auditability, traceability, and transparency of AI outputs and evaluation processes.

• Convert evaluation findings into actionable insights for engineering, product, and business stakeholders.

• Assist in decision-making regarding model readiness, release risks, and performance trade-offs.

• Proactively identify risks, patterns, and systemic issues and escalate them as necessary.

⛳️ Requirements

• Over 7 years of experience in software testing, with at least 2–3 years dedicated to AI/ML-enabled systems in production settings.

• Demonstrated experience in designing and implementing AI evaluation frameworks and quality strategies.

• Proven success in building ground truth datasets, drift detection systems, and scalable evaluation pipelines.

• Experience in testing multi-step agentic workflows and AI-driven automation systems.

• Familiarity with fast-paced, iterative delivery environments.

• Background in regulated or compliance-focused environments is preferred.

• Advanced proficiency in Python programming for evaluation frameworks, batch processing, and data analysis.

• Experience with LLM evaluation tools such as deepeval, RAGAS, promptfoo, or similar tools.

• Strong skills in AI output validation, hallucination detection, and grounding checks.

• Knowledge of drift detection frameworks and statistical evaluation methods.

• Expertise in OCR, VLM, and document AI testing, including classification, extraction, and edge cases.

• Proficient in API testing using Python (requests/httpx) and Postman/Newman.

• SQL experience for data validation and ensuring pipeline integrity.

• Familiarity with LangChain, LlamaIndex, or similar frameworks.

• Experience with cloud AI platforms such as Azure AI Foundry or AWS Bedrock is preferred.

🏝️ Benefits

• Health insurance

• Retirement plans

• Flexible work arrangements

• Professional development opportunities

Senior AI Test Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

QA Test Engineer

Senior QA Automation Engineer

SDET / QA Automation Engineer – Fintech, Web3

Middle QA Automation Engineer, Python

Software Test Engineer – Security Clearance

Field Service Technician, Field Service

Never miss a great job!