This is a fully remote position, open to applicants in United States.

📋 Description

• Innovate using Advanced AI Technologies: Develop and implement state-of-the-art AI solutions for essential Document Understanding tasks, including OCR/HTR, transcription, Named Entity Recognition (NER), Relation Extraction (RE), Coreference Resolution, Summarization, and Knowledge Graphs while working with a variety of genealogical and historical collections such as newspapers, city directories, family history books, and vital records (e.g., birth, marriage, and death records).

• Analyze and Enhance Multi-Modal Models: Assess the performance of multi-modal models within zero-shot and few-shot learning contexts for thorough document comprehension.

• Design Agentic Systems: Create and implement multi-agent workflows using frameworks such as LangChain, LangGraph, CrewAI, or AutoGen to automate intricate multi-step reasoning tasks in the analysis of historical documents.

• Evaluation and Monitoring: Set up "LLM-as-a-Judge" frameworks and utilize tools like Arize Phoenix, DeepEval, or RAGAS to track for hallucinations, drift, and bias.

• Collaborate on Cloud Implementation: Work closely with ML Ops and Data Science Engineers to efficiently deploy datasets, models, and pipelines within cloud environments.

• Communicate Findings Effectively: Present your insights, deliverables, and proposed solutions clearly and confidently to both technical and non-technical audiences, including teams, stakeholders, and executives.

⛳️ Requirements

• Currently enrolled in an advanced degree program (Master's or PhD preferred) in Computer Science, Data Science, Statistics, Mathematics, Linguistics, Engineering, or a related quantitative discipline with a strong emphasis on data.

• Specialization in AI & LLMs with knowledge of foundational models such as GPT, Gemini, Qwen, Llama, Claude, etc.

• Experience in inference optimization techniques, including vLLM, LoRA, QLoRA, and quantization.

• Familiarity with embeddings, vector databases, and transformer models, along with software development experience.

• Strong proficiency in Python and relevant libraries and tools, including transformer models, multi-modal models, and general NLP (e.g., Hugging Face Transformers, agentic frameworks and workflows, LangChain, LangGraph, CrewAI, AgentCore).

• Knowledge of cloud platforms and associated AI/ML services such as Google Cloud Platform, GCP, Gemini API, Vertex AI, AWS EC2, S3, SageMaker, Model Registry, and Bedrock is advantageous.

🏝️ Benefits

• Flexible work arrangements

• Professional development opportunities

Data Science – AI Document Understanding, Co-op

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Data Collaboration Lead

Lead Data Scientist

Senior Data Scientist, Product Analytics

Head of Data and Insights

Product Manager – MS, Data

Staff Data Scientist – Digital Assets

Never miss a great job!