
Principal Architect
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in Brazil.
• Assist the Engineering Manager (EM) with daily responsibilities.
• Lead the execution of daily delivery tasks (planning, managing dependencies, unblocking issues, risk management) and ensure that commitments are fulfilled.
• Uphold engineering standards (definition of done, testing strategies, code quality, documentation, and operational readiness).
• Provide transparent engineering updates to stakeholders and escalate issues promptly when necessary.
• Facilitate team execution without direct management: assist with onboarding and workflows, mentor/coach engineers and tech leads, and share best practices.
• Ensure knowledge transfer and proper handover: document decisions, create runbooks, and outline key architectural choices; facilitate a seamless transition at the conclusion of the engagement.
• Collaborate with Product/Business to identify areas where agentic AI adds clear value (workflow automation, assisted decision-making, content acceleration).
• Convert requirements into technical objectives (latency, cost, quality, robustness, compliance) and define success criteria (KPIs, A/B testing, guardrails).
• Design agentic architectures: tool orchestration, planning, memory management, context management, retrieval (RAG), routing, multi-agent patterns, and human-in-the-loop processes.
• Implement production patterns: prompt/versioning, evaluation harnesses, regression tests, feature flags, canary releases, and monitoring for quality, cost, and latency.
• Engineer for resilience: incorporate fallbacks, timeouts, retries, circuit breakers, safe tool execution, sandboxing, and secrets management.
• Establish guardrails: create tool policies, implement content filtering, PII redaction, policy-as-code, access controls, auditability, and traceability (traces, conversations, decisions).
• Collaborate with IT/Security/Cloud teams to ensure privacy, security, and risk compliance at scale.
• Define and enforce quality gates prior to production (red-teaming, adversarial testing, bias, and hallucination risk management).
• Build and manage the ML value chain: data contracts, data quality, lineage, drift monitoring, dataset management, and training/inference pipelines.
• Oversee production operations for models and agentic services: deployment, scaling, observability, incident response, SLOs, and post-mortems.
• Industrialize continuous evaluation: conduct offline evaluations, create golden sets, establish human feedback loops, and develop scorecards.
• Coach tech leads and engineers on delivery practices, quality, and operational excellence (without direct people management).
• Enhance ways of working: focus on documentation, testing, product ownership (“you build it, you run it”), and incident readiness.
• Promote collaboration among software engineers, data engineers, data scientists, ML engineers, SRE, product managers, and security teams.
• Proven engineering leadership experience in delivering data/ML products into production (scalability, reliability, security).
• Strong understanding of modern LLM/agentic patterns: RAG, embeddings, tool-calling/function execution, memory management, evaluation, and tracing.
• Excellent foundational skills in software engineering (architecture, microservices, API design, testing, CI/CD) and cloud technologies (preferably Azure).
• Practical experience with MLOps/LLMOps: monitoring, deployment, governance, cost/latency optimization, and observability (metrics/logs/traces).
• Solid knowledge of data engineering (pipelines, orchestration, quality) and SRE practices (SLOs, incidents, runbooks).
• Servant leadership style with high standards and strong stakeholder alignment capabilities.
• Product-oriented mindset with a focus on impact (measurement, iteration, prioritization).
• Ability to clearly communicate complex topics (risks, trade-offs, technical roadmap).
• Comfort in navigating ambiguity and transforming emerging technologies (agentic/LLM) into actionable plans and standards.
• Nice to Have: Significant experience across software/data/ML, including notable technical leadership (people management experience is a plus but not mandatory).
• Proven experience in delivering ML/LLM solutions under real-world constraints (security, cost, performance, operations).
• Health and dental insurance
• Meal and food allowance
• Childcare assistance
• Extended paternity leave
• Partnerships with gyms and health and wellness professionals through Wellhub (Gympass) TotalPass;
• Profit Sharing and Results Participation (PLR);
• Life insurance
• Access to a continuous learning platform (CI&T University);
• Membership in a discount club
• Free online platform dedicated to physical, mental, and overall well-being
• Pregnancy and responsible parenting course
• Collaborations with online learning platforms
• Language learning platform
• And many more!
Stefanini Brasil
evoila
Honeycomb.io
Get handpicked remote jobs straight to your inbox weekly.