
Senior Software Developer in Test, Python
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in Poland.
• Analyze requirements and establish the testing strategy for new features and product modifications.
• Automate test scenarios utilizing the current framework built on Python and PyTest.
• Develop automated quality assessment pipelines for AI systems using metrics and LLM-as-judge methodologies.
• Conduct testing on MCP servers, tool schemas, and tool-call behaviors, including edge cases and invalid inputs.
• Assess agentic workflows, focusing on tool selection, multi-step reasoning, error management, loop recovery, and state accuracy.
• Sustain and enhance the test automation framework and aid in the development of internal testing tools, including mocks.
• Create and uphold test documentation, which encompasses checklists, test cases, and quality reports.
• Engage in test design, estimations, release testing, and product quality evaluations.
• Contribute to improvements in CI/CD and QA processes.
• Design and manage evaluation suites and golden datasets for RAG and agentic workflows.
• Execute adversarial testing for AI systems, addressing prompt injection, jailbreaks, tool misuse, and data leakage concerns.
• Establish regression checks for alterations in prompts, models, retrieval settings, and chunking strategies.
• Monitor the quality of AI systems alongside cost, latency, and token usage.
• Utilize tracing and observability tools to debug, assess, and enhance LLM application performance.
• Over 5 years of experience in Quality Assurance, encompassing both manual and automated testing.
• Strong grasp of QA principles, test design, test coverage, test pyramid, and Software Development Life Cycle (SDLC).
• Proficient with Python-based test automation frameworks, such as PyTest, Behave, or comparable tools.
• Familiarity with CI/CD and monitoring or alerting tools, like Datadog, ELK, Sentry, or similar.
• Passion for testing AI/LLM-based systems; hands-on experience is preferred, though quick learners eager to develop in this field are also welcome.
• Knowledge of RAG, LLM evaluation, and quality metrics such as groundedness, faithfulness, answer relevance, and retrieval quality.
• Experience or interest in AI evaluation tools, including RAGAS, DeepEval, promptfoo, LangSmith Eval, TruLens, Arize Phoenix, or similar resources.
• Understanding of how to test non-deterministic systems, where multiple correct outputs may exist.
• Familiarity with LangChain, LangGraph, MCP, vector databases, semantic search, or LLM observability tools is a significant advantage.
• Proficient in spoken and written English (B2 level or higher).
• Full-time employment opportunities.
• Private health insurance.
• An additional day off (1) each calendar year.
• Compensation for sports programs.
• Comprehensive mental health program.
• Free online English classes with native speakers.
• Generous referral program.
• Training, internal workshops, and opportunities to participate in international professional conferences and corporate events.
Confitec
DOMVS iT
Anyone AI
FCamara Consulting & Training
Get handpicked remote jobs straight to your inbox weekly.