
Senior Lead AI Engineer, Data
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in India.
• Oversee the design and execution of data pipelines that generate high-quality training data for AI models.
• Develop data curation workflows that convert raw enterprise data into labeled and validated datasets.
• Create frameworks for data quality including validation, profiling, anomaly detection, and lineage tracking.
• Enhance current anonymized data export pipelines to accommodate AI training workloads.
• Establish pipelines for synthetic data generation.
• Design schema mappings across more than 197 enterprise tables for feature extraction.
• Work closely with ML engineers to clarify training data format requirements.
• Set up a data catalog and manage metadata for AI training artifacts.
• Over 10 years of experience in software engineering, with at least 5 years focused on data engineering.
• Extensive experience with Apache Spark / PySpark and large-scale data processing.
• Proven track record in building ETL/ELT pipelines in cloud environments (managed Spark, object storage, managed ETL, or similar).
• Familiarity with data quality frameworks and data governance practices.
• Experience in data anonymization and privacy-preserving data processing techniques.
• Solid understanding of ML training data requirements.
• Proficient in Python and SQL.
• Experience with data cataloging tools and metadata management systems.
• Bachelor’s or Master’s degree in Computer Science or equivalent experience.
• Experience in B2B SaaS environments with multi-tenant data preferred.
• Cutting-edge Technology
• Supportive and Collaborative Work Culture
• Opportunity for Global Impact
Credo AI
Get handpicked remote jobs straight to your inbox weekly.