This is a fully remote position, open to applicants in Pakistan.

📋 Description

• Develop and oversee the comprehensive QA strategy for the Conversational Banking Platform, which includes functional, regression, performance, security, and AI-specific assessments.

• Create and uphold golden datasets, evaluation suites, and frameworks utilizing LLM-as-judge to ensure conversational quality across various intents, languages, and tenants.

• Establish the QA gate for tenant onboarding, along with the certification checklist that each new business unit must complete prior to going live.

• Formulate regression strategies for modifications to prompts, upgrades to models, updates to retrieval indexes, and alterations in guardrail policies.

• Leverage Langfuse traces for evaluations: analyze production failures, transform them into test cases, and provide feedback to engineering.

• Assess NeMo Guardrails configurations against vulnerabilities such as jailbreaks, prompt injections, off-topic drifts, and instances of false-positive over-blocking.

• Ensure validation of governance and compliance measures, including data residency, handling of PII, disclosures for regulated products, and off-limits topics.

• Develop automated testing harnesses for Spring AI services, which encompass tool-calling validation, RAG groundedness, and integration with Cosmos DB and MongoDB data layers.

• Collaborate with the Platform team to establish quality metrics, SLOs, and the platform evaluation scorecard.

• Mentor feature engineers and tenant teams on crafting their own evaluations, promoting self-service quality at the platform level over time.

⛳️ Requirements

• A minimum of 6 years of experience in software QA, with at least 1–2 years dedicated to testing LLM-based, RAG, or conversational AI systems in a production environment.

• Practical experience with LLM observability and evaluation tools such as Langfuse, LangSmith, Arize, or Phoenix.

• Familiarity with evaluation frameworks like Ragas, DeepEval, Promptfoo, or TruLens — including metrics such as faithfulness, groundedness, answer relevance, and context precision.

• A solid understanding of how to test non-deterministic systems: golden datasets, semantic similarity, LLM-as-judge, and statistical regression detection.

• Experience with testing guardrail or policy frameworks (like NeMo Guardrails, Guardrails AI, or similar solutions).

• Strong foundation in API testing, automation frameworks (e.g., pytest, JUnit, Karate, RestAssured), and CI/CD integration.

• Familiarity with Spring and Spring Boot applications as well as JVM-based services.

• Proficiency in writing queries against NoSQL databases (MongoDB, Cosmos DB) for setting up test data and inspecting traces.

• Excellent written communication skills: capable of producing clear test plans, defect reports, and tenant readiness assessments.

• Preferred experience in banking, financial services, or other regulated industries.

• Exposure to multi-tenant platforms: understanding the implications of shared infrastructure on testing challenges.

• Familiarity with red-teaming, adversarial prompt testing, and defenses against prompt injection.

• Working knowledge of vector databases, embedding models, and retrieval evaluation methodologies.

• Experience with multi-language conversational systems.

• Performance and load testing experience specifically for AI workloads (including token throughput, latency percentiles, and cost per conversation).

• Contributions to open-source evaluation or AI testing tools.

• Experience collaborating with compliance, risk, or audit teams on AI assurance initiatives.

🏝️ Benefits

• Comprehensive health, dental, and vision insurance.

• Flexible working hours and remote work options.

• Opportunities for professional development and career advancement.

• Engaging work environment and collaborative team culture.

Senior SQA Engineer – LLM

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

QA Engineer

Senior QA Engineer

Senior QA

Analista de Testes/Qualidade – Pleno

QA Tester

QA Engineer

Never miss a great job!