This is a fully remote position, open to applicants in India.

📋 Description

• Lead the design and management of the enterprise AI data platform — a unified and governed framework that ingests, transforms, stores, and delivers all data utilized by AI systems throughout the organization.

• Create multi-domain data models (lakehouse, data mesh, event-driven) that are structured from the outset to support AI workloads: clean lineage, versioned schemas, well-documented contracts, and high-performance serving APIs.

• Oversee the complete data stack: real-time streaming (Kafka, Spark Structured Streaming), batch processing (Databricks, PySpark, Delta Lake), cloud storage and computing (AWS, Azure), along with data quality and metadata management.

• Guarantee that this platform acts as the sole, authoritative data source for all downstream users — conversational AI, dashboard assistants, autonomous agents, ML models, and reporting — thereby eliminating data silos and inconsistent truths.

• Propel the modernization of legacy pipelines (on-prem ETL, batch DWH) to cloud-native, AI-ready architectures with measurable enhancements in cost, latency, and delivery speed.

• Develop the semantic layer that resides above raw data — business-aligned ontologies, entity relationships, domain taxonomies, and knowledge graphs — enabling AI systems to comprehend context, not merely tokens.

• Construct and sustain knowledge graphs (Neo4j or equivalent) that encapsulate relationships among business entities, policies, KPIs, hierarchies, and domain rules — facilitating structured reasoning alongside unstructured retrieval.

• Define and govern a feature store and semantic data contracts that cater to both classical ML models and LLM-based applications from a single, well-versioned, trusted source.

• Manage metadata, data lineage, and audit trails throughout the semantic layer — ensuring every AI system can trace its outputs back to the source data with complete accountability.

• Design and implement a robust data governance framework that regulates access for both human users and AI agents — utilizing role-based access control (RBAC), attribute-based policies, and agent-specific permission scopes that prevent privilege escalation.

⛳️ Requirements

• Over 15 years of practical experience in data engineering and architecture, including 3–5+ years dedicated to building production AI/ML and LLM-era data infrastructure.

• Demonstrated experience in designing enterprise-scale AI data platforms that accommodate multiple AI consumers — rather than just a single application or pipeline.

• Extensive expertise in lakehouse and data mesh architectures: Databricks, Delta Lake, PySpark, Kafka, Spark Structured Streaming, and cloud-native data services (AWS, Azure).

• Practical experience with vector stores, semantic models, knowledge graphs, and retrieval infrastructure in live environments.

• Familiarity with LLMOps: model serving pipelines, MLflow, CI/CD for AI, automated evaluation, and production monitoring.

• A solid background in data governance, security, and compliance within regulated sectors (financial services, payments, cybersecurity, healthcare).

• Experience in establishing data access controls for AI agents and automated systems — beyond just human users.

🏝️ Benefits

• Health insurance

• Flexible work hours

• Professional development opportunities

Cloud Data Architect, AI Experience

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Senior Data Engineer

Mid-level Data Engineer

AI Data Engineer

Data Engineer

Data Engineer

Data Engineering Manager

Never miss a great job!