
Principal AI Solutions Engineer
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in Belgium.
• Design and enhance our multi-agent orchestration platform, currently based on Hermes / Multica.
• Create and implement voice AI pipelines, including STT (VibeVoice-ASR, Whisper), real-time TTS with streaming (VibeVoice-Realtime), VAD (Silero), and SIP/RTP telephony integration, targeting sub-300 ms end-to-end latency.
• Develop and sustain RAG pipelines that include retrieval quality measurement, re-ranking, and hybrid search utilizing both vector and keyword indexes.
• Establish MCP server architecture and define tool-use agreements for both internal and third-party integrations.
• Fine-tune and assess LLMs (LoRA, QLoRA, DPO) for specific tasks in domains such as customer support, classification, and structured extraction.
• Manage the AI observability stack, incorporating Langfuse tracing, span-level LLM call instrumentation, cost tracking, and quality regression alerts.
• Set and enforce safeguards including hallucination detection, PII redaction, output safety scanning, and rate-limiting across multi-tenant deployments.
• Develop data ingestion, preprocessing, and feature pipelines that support model training and continuous learning.
• Lead CI/CD efforts for ML, including automated evaluation gating, shadow deployments, canary releases, and rollback triggers.
• Establish architectural standards for AI systems across the organization; conduct design reviews and own architectural decision records for significant choices.
• Mentor ML engineers and applied scientists, enhancing the team's skills in production AI beyond just prototype development.
• Collaborate with Product and Commercial teams to convert business challenges into ML problem formulations with well-defined success metrics.
• Engage with external research collaborators and monitor emerging developments (arXiv, conference proceedings, open-source releases) to pinpoint signals that are worth productionizing.
• Over 8 years of experience in ML Engineering, Applied AI, or Research Engineering, with a minimum of 2 years in a lead or staff-level position.
• Extensive, hands-on experience with LLMs in production environments, including fine-tuning, RLHF/DPO, prompt engineering, RAG, and tool utilization.
• Proficient in Python and the essential ML stack: PyTorch, Transformers (HuggingFace), PEFT/LoRA.
• Practical experience with LLM inference serving technologies such as vLLM, TensorRT-LLM, or TGI in latency-sensitive production settings.
• Strong knowledge of agentic frameworks, including multi-agent coordination, tool-call orchestration, context/memory management, and observability (Langfuse, Opik, or similar).
• Experience in speech AI (ASR/TTS pipelines) or real-time audio systems is highly desirable.
• Solid grasp of MLOps concepts, including experiment tracking (MLflow/W&B), model registries, containerization (Docker/Kubernetes), and CI/CD practices for ML.
• Awareness of specific risks associated with LLMs, such as hallucination, prompt injection, data leakage, fairness, and privacy, along with strategies for mitigation in production.
• Excellent communication abilities, capable of drafting concise design documents, conducting effective architecture reviews, and explaining trade-offs to non-technical stakeholders.
• Health insurance
• 401(k) matching
• Flexible working hours
• Paid time off
• Remote work options
NVIDIA
Towa Software
AIM Qualifications and Assessment Group
Get handpicked remote jobs straight to your inbox weekly.