Remotery

Data Science – AI Document Understanding, Co-op

Posted Jun 20

This is a fully remote position, open to applicants in United States.

📋 Description

• Innovate using Advanced AI Technologies: Develop and implement state-of-the-art AI solutions for essential Document Understanding tasks, including OCR/HTR, transcription, Named Entity Recognition (NER), Relation Extraction (RE), Coreference Resolution, Summarization, and Knowledge Graphs while working with a variety of genealogical and historical collections such as newspapers, city directories, family history books, and vital records (e.g., birth, marriage, and death records).

• Analyze and Enhance Multi-Modal Models: Assess the performance of multi-modal models within zero-shot and few-shot learning contexts for thorough document comprehension.

• Design Agentic Systems: Create and implement multi-agent workflows using frameworks such as LangChain, LangGraph, CrewAI, or AutoGen to automate intricate multi-step reasoning tasks in the analysis of historical documents.

• Evaluation and Monitoring: Set up "LLM-as-a-Judge" frameworks and utilize tools like Arize Phoenix, DeepEval, or RAGAS to track for hallucinations, drift, and bias.

• Collaborate on Cloud Implementation: Work closely with ML Ops and Data Science Engineers to efficiently deploy datasets, models, and pipelines within cloud environments.

• Communicate Findings Effectively: Present your insights, deliverables, and proposed solutions clearly and confidently to both technical and non-technical audiences, including teams, stakeholders, and executives.


⛳️ Requirements

• Currently enrolled in an advanced degree program (Master's or PhD preferred) in Computer Science, Data Science, Statistics, Mathematics, Linguistics, Engineering, or a related quantitative discipline with a strong emphasis on data.

• Specialization in AI & LLMs with knowledge of foundational models such as GPT, Gemini, Qwen, Llama, Claude, etc.

• Experience in inference optimization techniques, including vLLM, LoRA, QLoRA, and quantization.

• Familiarity with embeddings, vector databases, and transformer models, along with software development experience.

• Strong proficiency in Python and relevant libraries and tools, including transformer models, multi-modal models, and general NLP (e.g., Hugging Face Transformers, agentic frameworks and workflows, LangChain, LangGraph, CrewAI, AgentCore).

• Knowledge of cloud platforms and associated AI/ML services such as Google Cloud Platform, GCP, Gemini API, Vertex AI, AWS EC2, S3, SageMaker, Model Registry, and Bedrock is advantageous.


🏝️ Benefits

• Flexible work arrangements

• Professional development opportunities

People also viewed

Zeta Global35 min ago

Data Collaboration Lead

US flagUnited States OnlyFull-timeData Scientist$180k – $200k/year
ApplyView job
Humana1 hour ago

Lead Data Scientist

US flagKentucky, +4 more statesFull-timeData Scientist$142.3k – $195.7k/year
ApplyView job
Binance.US12 hours ago

Senior Data Scientist, Product Analytics

US flagUnited States OnlyFull-timeData Scientist$170k – $195k/year
ApplyView job
10x Genomics12 hours ago

Head of Data and Insights

US flagUnited States OnlyFull-timeData Scientist$318.3k – $430.7k/year
ApplyView job
Dynatron Software, Inc.12 hours ago

Product Manager – MS, Data

US flagUnited States OnlyFull-timeData Scientist$140k – $150k/year
ApplyView job
Circle12 hours ago

Staff Data Scientist – Digital Assets

US flagCalifornia OnlyFull-timeData Scientist$195k – $257.5k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers