
Staff Engineer – SDET, Pytest, Agentic Systems
Posted May 30

Posted May 30
This is a fully remote position, open to applicants in India.
• Responsible for establishing and implementing the technical quality assurance strategy for Agentic AI solutions.
• Concentrate on testing intricate orchestrators and sub-agent architectures developed in Python using open-source frameworks.
• Ensure the reliability, precision, and efficiency of multi-agent systems while optimizing operational costs and system performance.
• Design and create comprehensive test suites tailored for multi-agent architectures.
• Develop automated monitors and test cases to track token usage and identify as well as mitigate redundant calls.
• Conduct thorough performance testing to assess and enhance end-to-end latency.
• Construct and maintain automated evaluation pipelines utilizing metrics to validate LLM outputs.
• Evaluate the decision-making capabilities of the orchestrator and manage edge cases effectively.
• Create Python-based automation frameworks for handling non-deterministic AI outputs.
• Integrate AI-specific testing gates into DevOps pipelines.
• Total Experience: 5–9 years in Software Development Engineer in Test (SDET) or Quality Engineering positions.
• AI/LLM Experience: At least 1+ years of hands-on experience in testing LLM-based applications, RAG pipelines, or Agentic workflows.
• Framework Experience: Demonstrated experience with AWS Bedrock Agent Core and/or Strands. Comparable experience with LangChain, LangGraph, LlamaIndex, or Google ADK (Agent Development Kit) is also highly acceptable.
• Agentic Systems: Direct experience in constructing or testing systems involving multi-agent coordination, tool-use (function calling), and autonomous planning.
• Cloud Experience: Strong familiarity with AWS services (Lambda, CloudWatch, Bedrock) or equivalent services from Google Cloud/Azure AI.
• High proficiency in Python, including experience in asynchronous programming.
• In-depth understanding of agentic patterns (ReAct, Plan-and-Execute) and the intricacies of testing non-deterministic systems.
• Capability to analyze logs and traces to pinpoint bottlenecks in agent reasoning and propose cost-saving measures in prompt design or model selection.
• Proficiency with Pytest and experience with observability/tracing tools such as LangSmith, AWS Cloudwatch, or AWS X-Ray.
• Knowledge of NLP and LLM evaluation techniques, including employing "LLM-as-a-judge" for assessing complex sub-agent outputs.
• Exceptional analytical skills for diagnosing "hallucinations" or logical errors during the orchestrator’s planning phase.
• Strong verbal and written communication skills, with the ability to effectively convey technical risks associated with AI performance and costs to stakeholders.
• Health insurance
• 401(k) matching
• Flexible work hours
• Paid time off
• Professional development opportunities
Uvation
Zartis
Bitrefill
Miratech
Get handpicked remote jobs straight to your inbox weekly.