
Data Science – AI Document Understanding, Co-op
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in United States.
• Innovate using Advanced AI Technologies: Develop and implement state-of-the-art AI solutions for essential Document Understanding tasks, including OCR/HTR, transcription, Named Entity Recognition (NER), Relation Extraction (RE), Coreference Resolution, Summarization, and Knowledge Graphs while working with a variety of genealogical and historical collections such as newspapers, city directories, family history books, and vital records (e.g., birth, marriage, and death records).
• Analyze and Enhance Multi-Modal Models: Assess the performance of multi-modal models within zero-shot and few-shot learning contexts for thorough document comprehension.
• Design Agentic Systems: Create and implement multi-agent workflows using frameworks such as LangChain, LangGraph, CrewAI, or AutoGen to automate intricate multi-step reasoning tasks in the analysis of historical documents.
• Evaluation and Monitoring: Set up "LLM-as-a-Judge" frameworks and utilize tools like Arize Phoenix, DeepEval, or RAGAS to track for hallucinations, drift, and bias.
• Collaborate on Cloud Implementation: Work closely with ML Ops and Data Science Engineers to efficiently deploy datasets, models, and pipelines within cloud environments.
• Communicate Findings Effectively: Present your insights, deliverables, and proposed solutions clearly and confidently to both technical and non-technical audiences, including teams, stakeholders, and executives.
• Currently enrolled in an advanced degree program (Master's or PhD preferred) in Computer Science, Data Science, Statistics, Mathematics, Linguistics, Engineering, or a related quantitative discipline with a strong emphasis on data.
• Specialization in AI & LLMs with knowledge of foundational models such as GPT, Gemini, Qwen, Llama, Claude, etc.
• Experience in inference optimization techniques, including vLLM, LoRA, QLoRA, and quantization.
• Familiarity with embeddings, vector databases, and transformer models, along with software development experience.
• Strong proficiency in Python and relevant libraries and tools, including transformer models, multi-modal models, and general NLP (e.g., Hugging Face Transformers, agentic frameworks and workflows, LangChain, LangGraph, CrewAI, AgentCore).
• Knowledge of cloud platforms and associated AI/ML services such as Google Cloud Platform, GCP, Gemini API, Vertex AI, AWS EC2, S3, SageMaker, Model Registry, and Bedrock is advantageous.
• Flexible work arrangements
• Professional development opportunities
Zeta Global
Humana
Binance.US
10x Genomics
Get handpicked remote jobs straight to your inbox weekly.