
MLOps, LLMOps Engineer
Posted 1 day ago

Posted 1 day ago
• Design, automate, and manage scalable ML and LLM systems within Irth’s enterprise Lakehouse platform.
• Collaborate closely with Data Science, Engineering, and Product teams to implement dependable, secure, and production-ready ML and GenAI solutions.
• Emphasize the operationalization of ML models, the creation of CI/CD pipelines, governance and compliance adherence, and the upkeep of high-performance, observable AI systems.
• Operationalize the processes of model training, evaluation, packaging, and deployment utilizing Databricks, Delta Lake, and medallion architecture.
• Execute Unity Catalog model governance, lineage tracking, and access control measures.
• Create reusable job templates, cluster policies, and standardized deployment methodologies.
• Deploy and oversee ML and GenAI solutions, including risk scoring, anomaly detection, predictive maintenance, NLP, and RAG pipelines.
• Construct and enhance LLM pipelines using vector databases, model serving endpoints, and inference workflows.
• Optimize models through quantization, caching, and performance tuning methods.
• Establish batch and real-time inference pipelines with defined SLAs.
• Implement data contracts, schema validations, and data quality assessments across ML pipelines.
• Ensure secure management of sensitive data, including PII detection, classification, and obfuscation.
• Maintain complete lineage tracing from data sources to deployed models and serving endpoints.
• Enforce data residency, governance, and compliance protocols.
• Establish CI/CD pipelines using GitHub Actions and Databricks Asset Bundles.
• Automate deployments across DEV, QA, and PROD environments.
• Develop unit and integration tests for data pipelines and ML models.
• Ensure version control, reproducibility, and automated deployment processes.
• Monitor pipeline health, model performance, drift, and system reliability.
• Implement alerting systems, incident response workflows, and automated ticketing.
• Track LLM performance metrics, including latency, hallucination rates, and API costs.
• Create runbooks, disaster recovery protocols, and operational documentation.
• Apply tagging policies and cost tracking for ML infrastructure.
• Assist in budget monitoring, cost optimization, and resource management.
• 3–5 years of experience in MLOps, LLMOps, or ML platform engineering roles.
• Practical experience with Databricks, Delta Lake, Unity Catalog, and ML deployment workflows.
• Strong background in CI/CD pipelines utilizing GitHub Actions and infrastructure automation.
• Experience in implementing data quality validation, schema governance, and data contracts.
• Proven experience in developing production-grade ML pipelines with monitoring and observability.
• Strong security knowledge, including RBAC, encryption, data residency, and governance practices.
• Proficiency in Python, SQL, and distributed data processing frameworks.
• Experience with LLM pipelines, prompt engineering, RAG workflows, and model optimization is preferred.
• Familiarity with vector databases, model serving, and MLflow is preferred.
• Experience with Azure and AWS cloud platforms, including security and networking, is preferred.
• Background in disaster recovery, FinOps, and enterprise-scale ML operations is preferred.
• Familiarity with Power BI, semantic layers, and enterprise analytics platforms is preferred.
• Be a vital part of a dynamic, expanding company that is highly regarded in its industry.
• Competitive salary based on experience.

Zeta Global

Motional

Spotify
Get handpicked remote jobs straight to your inbox weekly.