
Principal GenAI Data Engineer
Posted 1 hour ago

Posted 1 hour ago
This is a fully remote position, open to applicants in United States.
• Design and develop enterprise-scale GenAI data platforms for the ingestion, transformation, enrichment, and delivery of both structured and unstructured data.
• Create scalable data pipelines for the ingestion of enterprise knowledge from a variety of data sources, including documents, SaaS platforms, knowledge bases, collaboration tools, and databases.
• Establish the architecture for metadata extraction, chunking, enrichment, embeddings generation, and workflows for knowledge preparation.
• Develop AI-ready data models and storage strategies tailored for vector, graph, and hybrid knowledge systems.
• Construct scalable processing pipelines for unstructured data, including text, images, PDFs, tables, and multimodal content.
• Proficient in Python programming with expert-level software engineering skills.
• Proven experience in building distributed and scalable data pipelines for AI applications.
• Strong comprehension of unstructured data extraction and processing pipelines.
• Familiarity with vector databases, graph databases, and systems for storing metadata/knowledge.
• Practical experience with clustering, entity recognition algorithms, and contemporary retrieval strategies (including RAG, search, and agentic AI workflows).
• A range of health insurance plans.
• Time-off policies for vacations and sick leave.
• Options for parental leave.
• Retirement plan options.
• Education reimbursement programs.
• In-office perks, among others!
Instacart
CLASP
Tailor
Get handpicked remote jobs straight to your inbox weekly.