📋 Description

• Design, automate, and manage scalable ML and LLM systems within Irth’s enterprise Lakehouse platform.

• Collaborate closely with Data Science, Engineering, and Product teams to implement dependable, secure, and production-ready ML and GenAI solutions.

• Emphasize the operationalization of ML models, the creation of CI/CD pipelines, governance and compliance adherence, and the upkeep of high-performance, observable AI systems.

• Operationalize the processes of model training, evaluation, packaging, and deployment utilizing Databricks, Delta Lake, and medallion architecture.

• Execute Unity Catalog model governance, lineage tracking, and access control measures.

• Create reusable job templates, cluster policies, and standardized deployment methodologies.

• Deploy and oversee ML and GenAI solutions, including risk scoring, anomaly detection, predictive maintenance, NLP, and RAG pipelines.

• Construct and enhance LLM pipelines using vector databases, model serving endpoints, and inference workflows.

• Optimize models through quantization, caching, and performance tuning methods.

• Establish batch and real-time inference pipelines with defined SLAs.

• Implement data contracts, schema validations, and data quality assessments across ML pipelines.

• Ensure secure management of sensitive data, including PII detection, classification, and obfuscation.

• Maintain complete lineage tracing from data sources to deployed models and serving endpoints.

• Enforce data residency, governance, and compliance protocols.

• Establish CI/CD pipelines using GitHub Actions and Databricks Asset Bundles.

• Automate deployments across DEV, QA, and PROD environments.

• Develop unit and integration tests for data pipelines and ML models.

• Ensure version control, reproducibility, and automated deployment processes.

• Monitor pipeline health, model performance, drift, and system reliability.

• Implement alerting systems, incident response workflows, and automated ticketing.

• Track LLM performance metrics, including latency, hallucination rates, and API costs.

• Create runbooks, disaster recovery protocols, and operational documentation.

• Apply tagging policies and cost tracking for ML infrastructure.

• Assist in budget monitoring, cost optimization, and resource management.

⛳️ Requirements

• 3–5 years of experience in MLOps, LLMOps, or ML platform engineering roles.

• Practical experience with Databricks, Delta Lake, Unity Catalog, and ML deployment workflows.

• Strong background in CI/CD pipelines utilizing GitHub Actions and infrastructure automation.

• Experience in implementing data quality validation, schema governance, and data contracts.

• Proven experience in developing production-grade ML pipelines with monitoring and observability.

• Strong security knowledge, including RBAC, encryption, data residency, and governance practices.

• Proficiency in Python, SQL, and distributed data processing frameworks.

• Experience with LLM pipelines, prompt engineering, RAG workflows, and model optimization is preferred.

• Familiarity with vector databases, model serving, and MLflow is preferred.

• Experience with Azure and AWS cloud platforms, including security and networking, is preferred.

• Background in disaster recovery, FinOps, and enterprise-scale ML operations is preferred.

• Familiarity with Power BI, semantic layers, and enterprise analytics platforms is preferred.

🏝️ Benefits

• Be a vital part of a dynamic, expanding company that is highly regarded in its industry.

• Competitive salary based on experience.

MLOps, LLMOps Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Senior Product Manager, Modeling & Machine Learning Operations

Staff Machine Learning Engineer

Machine Learning Engineer I, Personalization

Machine Learning Engineer, Underwriting

Machine Learning Software Engineer

Senior AI / ML Engineer

Never miss a great job!