
Technical Staff Member – Data Intelligence
Posted May 6

Posted May 6
• Collaborate with model researchers to establish the definition of “good data” for our models, which includes quality metrics, validation checks, and acceptance thresholds.
• Investigate open source datasets and develop internal datasets that are most appropriate for constructing fundamental World Models.
• Create algorithms for automated evaluation of data quality, data domain mixtures, and the adaptation of synthetic data to real data.
• Monitor datasets, metadata, provenance, and versions to ensure experiments are reproducible and to clarify the data used in various training and evaluation processes.
• Oversee CI/CD and development tools for the data stack (GitHub, Python, PyTorch), and automate repetitive tasks to streamline workflows.
• Evaluate and enhance throughput, storage, and compute utilization across pipelines and associated assets.
• Strong foundational knowledge in ML and deep learning, with experience in building and managing large-scale data and/or computing systems.
• Comfortably navigate between research inquiries and production engineering: capable of analyzing data, conducting analyses, and deploying reliable systems.
• Proven research experience related to data compositions, quality, and dataset releases.
• Skill in designing and executing experiments that yield convincing and unbiased results.
• Practical experience with distributed processing and orchestration tools (such as Spark, Ray, Airflow, or similar alternatives).
• Proficient in Python, with familiarity in tools associated with contemporary model training workflows (datasets, checkpoints, experiment tracking).
• Strong understanding of data quality: methods for measurement, monitoring, and preventing regressions as systems scale.
• Capable of thriving in a dynamic environment, prioritizing effectively, and communicating clearly with both researchers and engineers.
• Bonus: experience with large video datasets, dataset curation for training purposes, or developing internal tools for evaluation/analysis in ML environments.
• Flexible work arrangements
Cedars-Sinai
Київстар
Zealogics Inc
Get handpicked remote jobs straight to your inbox weekly.