Remotery

Applied Data Scientist, LLM Evaluation

atDriverUS flagTexasFull-timeUncategorizedMid-levelSenior$175k – $275k/year

Posted 1 day ago

This is a fully remote position, open to applicants in Texas.

📋 Description

• Take ownership of the LLM evaluation strategy at Driver, guiding it from foundational principles to production-level infrastructure.

• Establish quality metrics and create evaluation datasets.

• Define the criteria for what constitutes 'good' for each content type throughout the pipeline.

• Develop and maintain gold-standard evaluation datasets across various languages and repository types (monorepos, microservices, libraries, applications).

• Create rubrics that assess accuracy, completeness, usefulness, and readability.

• Construct benchmarking and experimentation infrastructure.

• Develop automated evaluation pipelines that measure output against reference datasets.

• Instrument the content generation pipeline to facilitate A/B comparisons — executing the same codebase through two different strategies and analyzing the results.

• Create tools for LLM-as-judge evaluation and regression detection.

• Incorporate evaluation into CI so that pipeline modifications are accompanied by quality evidence.

• Generate automated quality signals at scale.

• Implement quality checks that identify degraded output without necessitating human review of every document.

• Track content quality trends over time.

• Design sampling strategies for human review that optimize signal detection with minimal annotation effort.

• Measure trade-offs and inform decision-making.

• Conduct experiments on model selection, context strategies, and changes to pipeline architecture.

• Analyze cost, quality, and latency trade-offs.

• Collaborate with the engineering team to translate evaluation insights into tangible improvements.


⛳️ Requirements

• Bachelor's, Master's, or PhD in Statistics, Machine Learning, Data Science, Computational Linguistics, or a related quantitative discipline.

• At least 3 to 5 years of experience in applied science, ML engineering, or data science roles with a focus on evaluation, NLP, or generative AI; 7+ years of experience is preferred.

• Strong foundation in statistics: experimental design, hypothesis testing, confidence intervals, effect sizes, and power analysis.

• Experience in designing and executing evaluations for LLM or NLP systems — you have carefully considered what 'better' signifies when dealing with open-ended text outputs.

• Proficient in Python and the scientific/data stack (pandas, NumPy, scipy, sklearn).

• Comfortable utilizing Jupyter notebooks for exploration and prototyping, and transforming that work into automated pipelines.

• Familiar with LLM-as-judge methodologies, inter-annotator agreement, and rubric design for subjective quality evaluation.

• Knowledgeable about the practical challenges associated with non-deterministic systems: variance decomposition, multi-run methodology, and differentiating signal from noise at scale.

• Strong data storytelling skills — able to convert experimental results into clear recommendations that inform engineering and product strategies.


🏝️ Benefits

• Competitive Compensation Packages - Cash & Equity

• Flexible Work Culture

• Unlimited Time Off + 12 Paid Company Holidays

• Insurance - Health, Dental, & Vision

• Life Insurance & FSA Accounts

• 401(k) Retirement Accounts - Traditional, Roth, or Both

• Quarterly Team Offsites

People also viewed

Urrly1 hour ago

Senior Vice President, Client Strategy

US flagNew York OnlyFull-timeUncategorized$175k – $215k/year
ApplyView job
Weiler Abrasives Group1 hour ago

National Accounts Manager

US flagUnited States OnlyFull-timeUncategorized
ApplyView job
Abbott1 hour ago

Associate Sales Representative, CRM

US flagColorado OnlyFull-timeUncategorized$43.9k – $109.2k/year
ApplyView job
Segoso1 hour ago

3rd Party Collections Specialist

US flagFlorida OnlyFull-timeUncategorized$17 – $20/hour
ApplyView job
DDN1 hour ago

Client Director – Strategic AI Infrastructure

US flagCalifornia OnlyFull-timeUncategorized$175k – $200k/year
ApplyView job
Kandu1 hour ago

Regional Sales Manager

US flagTexas OnlyFull-timeUncategorized$80k – $120k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers