
Senior AI Systems Quality Engineer
Posted 15 hours ago

Posted 15 hours ago
This is a fully remote position, open to applicants in Massachusetts.
• Develop and deploy production-level, automated validation frameworks, test harnesses, and evaluation pipelines throughout the AI lifecycle (from design to deployment).
• Design and enhance an AI testing platform that integrates with Databricks and MLflow, facilitating repeatable testing, traceability, and auditability.
• Construct extensive, scenario-driven test suites (ranging from hundreds to thousands of cases) to validate agentic workflows comprehensively, addressing edge cases, long-tail scenarios, and failure modes.
• Validate orchestration behavior (tool usage, memory management, decision logic) and conduct stress tests on non-deterministic system behavior prior to production.
• Incorporate quality by design: establish system contracts, guardrails, and safe-degradation patterns at critical boundaries.
• Define quantifiable quality signals for LLM systems (such as grounding, hallucinations, relevance, latency, and cost) and integrate them into CI/CD pipelines as automated quality gates.
• Ensure that AI validation executes automatically upon model, prompt, and code modifications—facilitating continuous quality enforcement.
• Create reusable libraries and components to enable teams to swiftly adopt consistent AI quality practices.
• Take ownership of certain aspects of AI release readiness, including establishing go/no-go criteria based on measurable quality thresholds.
• Collaborate with AI, platform, security, and delivery teams to translate mission requirements into clear quality criteria, trade-offs, and confidence levels.
• Over 7 years of software engineering experience, with a focus on backend or platform systems.
• Demonstrated experience in designing and implementing AI testing automation within production settings, rather than merely executing tests.
• Proven capability to develop custom validation, evaluation, or testing frameworks for complex, distributed systems.
• Strong expertise in Python and/or TypeScript within contemporary AI engineering stacks.
• Practical experience with AI-driven systems, including LLM-based or agentic workflows and non-deterministic behavior.
• Experience in designing or contributing to AI testing at scale, encompassing regression frameworks, long-tail evaluation, and extensive test coverage.
• Profound understanding of CI/CD integration, particularly in embedding automated tests and quality gates within deployment pipelines.
• Solid grasp of AWS cloud-native architectures.
• Proven history of engineering for quality, reliability, governance, and safety as fundamental system design principles.
• Familiarity with security, privacy, and operational risk in regulated or mission-critical environments, including failure modes and recovery strategies.
• Experience with AI testing methodologies, including evaluating non-deterministic outputs, drift detection, bias/fairness testing, and robust regression strategies.
• Proven ability to establish measurable trust thresholds for AI systems, including defining and operationalizing success metrics such as query accuracy, hallucination limits, explainability, and PHI-safe behavior as enforceable release criteria.
• Experience collaborating with domain experts to define correctness and real-world validation scenarios, enabling extensive, business-relevant test coverage that accurately reflects true production use cases rather than solely engineering perspectives.
• Unlimited paid time off – take time to recharge when needed.
• Work from anywhere – enjoy the flexibility to suit your lifestyle.
• Comprehensive health coverage – choose from multiple plan options.
• Equity for every employee – participate in our collective success.
• Growth-focused environment – your professional development is a priority here.
• Home office setup allowance – one-time support to help you get started.
• Monthly cell phone allowance – stay effortlessly connected.
Instacart
CLASP
Tevora
Tailor
Get handpicked remote jobs straight to your inbox weekly.