
Senior Data Engineer – AI Ingestion Platform
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in Argentina.
• Develop and manage the historical email ingestion pipeline utilizing Microsoft Graph API.
• Create a document ingestion pipeline for SharePoint/OneDrive with controlled folder access.
• Design and establish the PII minimization pre-processing layer.
• Construct the vector store indexing workflow (OpenSearch/Pinecone) ensuring data isolation per tenant.
• Define and implement the data processing schema; generate and maintain documentation for the schema.
• Create the OCR routing orchestrator and integrate OCR services for scanned documents.
• Implement the raw text and content extraction layer for all supported document types.
• Define and prototype ingestion strategies, comparing push versus pull methods, from initial proof of concept to incremental nightly pipelines.
• Ensure that data lineage and audit traceability are integrated into pipeline outputs from the beginning.
• Over 6 years of experience in data engineering with a strong background in pipeline and ETL/ELT processes.
• Expertise in Python for developing data pipelines.
• Familiarity with Microsoft Graph API or comparable enterprise email/document APIs (M365, Exchange Online).
• Experience with AWS data services including S3, DynamoDB, Glue, and/or Lambda-based event-driven processing.
• Knowledge of PII detection and data minimization techniques (using regex, NER, or specific libraries).
• Background in constructing vector store indexing or semantic search pipelines.
• Options for remote work.
Aimpoint Digital
Power Digital Marketing
Get handpicked remote jobs straight to your inbox weekly.