Remotery

Senior AI Systems Quality Engineer

Posted 15 hours ago

This is a fully remote position, open to applicants in Massachusetts.

đź“‹ Description

• Develop and deploy production-level, automated validation frameworks, test harnesses, and evaluation pipelines throughout the AI lifecycle (from design to deployment).

• Design and enhance an AI testing platform that integrates with Databricks and MLflow, facilitating repeatable testing, traceability, and auditability.

• Construct extensive, scenario-driven test suites (ranging from hundreds to thousands of cases) to validate agentic workflows comprehensively, addressing edge cases, long-tail scenarios, and failure modes.

• Validate orchestration behavior (tool usage, memory management, decision logic) and conduct stress tests on non-deterministic system behavior prior to production.

• Incorporate quality by design: establish system contracts, guardrails, and safe-degradation patterns at critical boundaries.

• Define quantifiable quality signals for LLM systems (such as grounding, hallucinations, relevance, latency, and cost) and integrate them into CI/CD pipelines as automated quality gates.

• Ensure that AI validation executes automatically upon model, prompt, and code modifications—facilitating continuous quality enforcement.

• Create reusable libraries and components to enable teams to swiftly adopt consistent AI quality practices.

• Take ownership of certain aspects of AI release readiness, including establishing go/no-go criteria based on measurable quality thresholds.

• Collaborate with AI, platform, security, and delivery teams to translate mission requirements into clear quality criteria, trade-offs, and confidence levels.


⛳️ Requirements

• Over 7 years of software engineering experience, with a focus on backend or platform systems.

• Demonstrated experience in designing and implementing AI testing automation within production settings, rather than merely executing tests.

• Proven capability to develop custom validation, evaluation, or testing frameworks for complex, distributed systems.

• Strong expertise in Python and/or TypeScript within contemporary AI engineering stacks.

• Practical experience with AI-driven systems, including LLM-based or agentic workflows and non-deterministic behavior.

• Experience in designing or contributing to AI testing at scale, encompassing regression frameworks, long-tail evaluation, and extensive test coverage.

• Profound understanding of CI/CD integration, particularly in embedding automated tests and quality gates within deployment pipelines.

• Solid grasp of AWS cloud-native architectures.

• Proven history of engineering for quality, reliability, governance, and safety as fundamental system design principles.

• Familiarity with security, privacy, and operational risk in regulated or mission-critical environments, including failure modes and recovery strategies.

• Experience with AI testing methodologies, including evaluating non-deterministic outputs, drift detection, bias/fairness testing, and robust regression strategies.

• Proven ability to establish measurable trust thresholds for AI systems, including defining and operationalizing success metrics such as query accuracy, hallucination limits, explainability, and PHI-safe behavior as enforceable release criteria.

• Experience collaborating with domain experts to define correctness and real-world validation scenarios, enabling extensive, business-relevant test coverage that accurately reflects true production use cases rather than solely engineering perspectives.


🏝️ Benefits

• Unlimited paid time off – take time to recharge when needed.

• Work from anywhere – enjoy the flexibility to suit your lifestyle.

• Comprehensive health coverage – choose from multiple plan options.

• Equity for every employee – participate in our collective success.

• Growth-focused environment – your professional development is a priority here.

• Home office setup allowance – one-time support to help you get started.

• Monthly cell phone allowance – stay effortlessly connected.

People also viewed

Instacart1 hour ago

Program Manager II

US flagCalifornia, +18 more statesFull-timeUncategorized$122k – $155k/year
ApplyView job
CLASP1 hour ago

Senior Product Manager – Candidate & Recruiter Platform

US flagMassachusetts OnlyFull-timeUncategorized$140k – $170k/year
ApplyView job
Tevora1 hour ago

Account Director

US flagOregon OnlyFull-timeUncategorized$110k – $130k/year
ApplyView job
Tailor1 hour ago

Forward-Deployed Product Manager – FDPM

US flagCalifornia OnlyFull-timeUncategorized$130k – $170k/year
ApplyView job
Cube Care Company1 hour ago

Human Resource Generalist

US flagUnited States OnlyFull-timeUncategorized
ApplyView job
Juniper Square1 hour ago

Product Marketing Engineer

US flagUnited States OnlyFull-timeUncategorized$160k – $215k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers