
Cloud Data Architect, AI Experience
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in India.
• Lead the design and management of the enterprise AI data platform — a unified and governed framework that ingests, transforms, stores, and delivers all data utilized by AI systems throughout the organization.
• Create multi-domain data models (lakehouse, data mesh, event-driven) that are structured from the outset to support AI workloads: clean lineage, versioned schemas, well-documented contracts, and high-performance serving APIs.
• Oversee the complete data stack: real-time streaming (Kafka, Spark Structured Streaming), batch processing (Databricks, PySpark, Delta Lake), cloud storage and computing (AWS, Azure), along with data quality and metadata management.
• Guarantee that this platform acts as the sole, authoritative data source for all downstream users — conversational AI, dashboard assistants, autonomous agents, ML models, and reporting — thereby eliminating data silos and inconsistent truths.
• Propel the modernization of legacy pipelines (on-prem ETL, batch DWH) to cloud-native, AI-ready architectures with measurable enhancements in cost, latency, and delivery speed.
• Develop the semantic layer that resides above raw data — business-aligned ontologies, entity relationships, domain taxonomies, and knowledge graphs — enabling AI systems to comprehend context, not merely tokens.
• Construct and sustain knowledge graphs (Neo4j or equivalent) that encapsulate relationships among business entities, policies, KPIs, hierarchies, and domain rules — facilitating structured reasoning alongside unstructured retrieval.
• Define and govern a feature store and semantic data contracts that cater to both classical ML models and LLM-based applications from a single, well-versioned, trusted source.
• Manage metadata, data lineage, and audit trails throughout the semantic layer — ensuring every AI system can trace its outputs back to the source data with complete accountability.
• Design and implement a robust data governance framework that regulates access for both human users and AI agents — utilizing role-based access control (RBAC), attribute-based policies, and agent-specific permission scopes that prevent privilege escalation.
• Over 15 years of practical experience in data engineering and architecture, including 3–5+ years dedicated to building production AI/ML and LLM-era data infrastructure.
• Demonstrated experience in designing enterprise-scale AI data platforms that accommodate multiple AI consumers — rather than just a single application or pipeline.
• Extensive expertise in lakehouse and data mesh architectures: Databricks, Delta Lake, PySpark, Kafka, Spark Structured Streaming, and cloud-native data services (AWS, Azure).
• Practical experience with vector stores, semantic models, knowledge graphs, and retrieval infrastructure in live environments.
• Familiarity with LLMOps: model serving pipelines, MLflow, CI/CD for AI, automated evaluation, and production monitoring.
• A solid background in data governance, security, and compliance within regulated sectors (financial services, payments, cybersecurity, healthcare).
• Experience in establishing data access controls for AI agents and automated systems — beyond just human users.
• Health insurance
• Flexible work hours
• Professional development opportunities
Aimpoint Digital
Power Digital Marketing
Get handpicked remote jobs straight to your inbox weekly.