
Technical Staff Member – Data Intelligence
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in United States.
• Collaborate with model researchers to establish the definition of “good data” for our models, encompassing quality metrics, validation checks, and acceptance thresholds.
• Investigate open-source datasets and develop internal datasets that are most appropriate for constructing fundamental World Models.
• Create algorithms for the automated assessment of data quality, management of data domain mixtures, and adaptation from synthetic to real data.
• Monitor datasets, metadata, provenance, and versions to ensure experiments are reproducible and to clarify which data was utilized in various training and evaluation runs.
• Oversee CI/CD and development tools for the data stack (GitHub, Python, PyTorch), while automating repetitive workflows to minimize friction.
• Analyze and enhance throughput, storage, and compute utilization across pipelines and associated assets.
• Strong foundational knowledge in ML and deep learning, with experience in building and managing large-scale data and/or compute systems.
• Proficient in transitioning between research inquiries and production engineering: capable of analyzing data, conducting analyses, and deploying reliable systems.
• Proven research experience with data compositions, quality, and dataset releases.
• Skilled in designing and executing experiments that yield convincing unbiased results.
• Hands-on experience with distributed processing and orchestration (e.g., Spark, Ray, Airflow, or similar tools).
• Excellent proficiency in Python, along with familiarity with tools related to modern model training workflows (datasets, checkpoints, experiment tracking).
• Strong understanding of data quality: how to measure it, monitor it, and prevent regressions as systems scale.
• Capable of thriving in a fast-paced environment, prioritizing key tasks, and communicating effectively with both researchers and engineers.
• Bonus: experience with large video datasets, dataset curation for training, or development of internal tools for evaluation/analysis in ML environments.
• Flexible work arrangements.
Urrly
Weiler Abrasives Group
Abbott
Segoso
Get handpicked remote jobs straight to your inbox weekly.