
AI/LLM Safety Engineer
Posted 10 hours ago

Posted 10 hours ago
This is a fully remote position, open to applicants in Kansas.
• Develop and uphold a safety evaluation framework, including adversarial prompt sets, scenario-based test suites, and regression suites, ensuring that each model and agent update is validated prior to release.
• Facilitate organized red-teaming exercises that focus on jailbreaks, prompt injection, tool misuse, and data exfiltration; document outcomes and ensure each issue is addressed and resolved.
• Create and refine guardrail logic, which encompasses input/output filtering, tool-boundary constraints, action validation, sensitive-data redaction, and policy prompting.
• Incorporate safety checks into CI/CD and runtime processes to intercept unsafe behavior before it impacts users.
• Conduct threat modeling for agentic scenarios, focusing on tool-call boundaries, sandbox isolation, and least-privilege access, with an emphasis on preventing agents from exfiltrating data or executing irreversible actions through chained tool calls.
• Execute safety reviews of reinforcement-learning (RL) environments and trajectory data, collaborating with environment and agent engineering teams to embed safety constraints directly into the environments.
• Equip AI features for safety with structured logging, tracing, and metrics, facilitating the detection of unsafe patterns and regressions in production.
• Compile evidence for governance reviews, including test reports, evaluation summaries, and mitigation validation, in alignment with internal Responsible AI standards.
• Work alongside Product and UX teams to enhance safety interactions (warnings, confirmations, refusal messaging, and feedback collection), ensuring evaluation goals are in sync with the Research and Data teams.
• Bachelor's or Master's degree in Computer Science, Software Engineering, Cybersecurity, or a comparable technical discipline—or equivalent practical experience.
• Over 4 years of experience in developing production software, with direct involvement in—or securing—ML/LLM systems.
• Proficient software engineering capabilities, with the ability to produce production-grade code (primarily in Python), extending beyond scripting or notebook prototyping.
• Strong comprehension of LLMs and ML: understanding model functionality, prompt engineering, and recognizing the safety implications of fine-tuning and RAG (e.g., unsafe retrieval, tool misuse, and data exfiltration).
• A security-oriented mindset with proven threat-modeling skills; capable of threat-modeling AI workflows and knowledgeable in the basics of access control, data retention, and incident response.
• Familiarity with the LLM attack landscape—prompt injection, jailbreaks, data poisoning, and supply-chain risk—with a working knowledge of the OWASP LLM Top 10.
• Practical experience with at least one area of safety evaluation or red teaming, with the capability to discuss a real finding and its remediation process.
• Health insurance
• Paid time off
• Flexible work arrangements
• Professional development
• Stock options
dexter health
Astreya
Gruve
Vultr
Get handpicked remote jobs straight to your inbox weekly.