
Applied ML Engineer, Data
Posted May 25

Posted May 25
This is a fully remote position, open to applicants in Europe.
• Develop and sustain data pipelines for extensive video generation models, encompassing data ingestion, parsing, filtering, preprocessing, and large-scale dataset curation, utilizing tools such as AWS S3 and DynamoDB.
• Create and manage annotation workflows across platforms like MTurk, Prolific, and Mechanical Turk, which includes task design, quality assurance, and label validation.
• Train, assess, and enhance smaller supporting models utilized for data filtering, quality evaluation, preprocessing, or other segments of the ML pipeline.
• Collaborate closely with research and engineering teams to convert experimental workflows into scalable, repeatable systems that facilitate model training and evaluation.
• Ensure data quality throughout the pipeline by identifying bottlenecks, failure modes, and sources of low quality, while continuously refining tools and processes.
• Develop internal tools and automation systems that simplify dataset preparation, launch annotation tasks, monitor outputs, and support comprehensive model development.
• Lead significant pipeline projects from inception to completion, such as new dataset creation initiatives or enhancements to labeling and preprocessing infrastructure.
• Operate within a Kubernetes-based training infrastructure, ensuring datasets are accurately prepared, formatted, and supplied to training clusters.
• Profile and optimize research model inference scripts used in preprocessing phases, ensuring that model-driven filtering and transformation stages adhere to practical time and cost constraints for large-scale raw data.
• A minimum of 3 years of experience in machine learning, applied ML, data pipelines, or related engineering positions, preferably focusing on large-scale multimodal, video, or vision-based systems.
• Proficient programming skills in Python and extensive experience in constructing reliable data processing and preprocessing pipelines for ML workflows.
• Practical experience in preparing training data for ML models, including parsing, filtering, dataset curation, quality control, and managing large-scale data using tools like AWS S3 and DynamoDB.
• Knowledge of annotation and labeling workflows, including task design, vendor or crowd-platform management such as MTurk or Prolific, and techniques to ensure label quality.
• Experience with Kubernetes for orchestrating distributed workloads, including data preprocessing, pipeline execution, and dataset delivery to training clusters.
• Familiarity with cloud and on-demand compute environments such as AWS and RunPod, with the capacity to adapt and optimize pipelines across different infrastructures.
• Experience with distributed data processing frameworks and designing systems that operate reliably at scale across multiple nodes or workers.
• Proficient in PyTorch and the broader deep learning ecosystem, capable of reading, debugging, and optimizing research model inference code for application in production preprocessing pipelines.
• Ability to work collaboratively across research and engineering teams, translating experimental concepts into effective, scalable systems.
• Bachelor's, Master's, or PhD in Computer Science, Machine Learning, Engineering, Mathematics, or a related technical field; experience in generative video, computer vision, or multimodal ML is highly desirable.
• Bonus: Experience in training, evaluating, or fine-tuning smaller ML models used for classification, filtering, ranking, quality assessment, or other supportive tasks within an ML pipeline.
• Competitive salary and substantial company equity
• Comprehensive medical, dental, and vision insurance – 99.99% of premiums covered by Cantina
• 42 days of paid time off, including:
• 15 PTO days
• 10 sick days
• 15 company holidays
• 2 floating holidays
• Generous parental leave and fertility support
• 401(k) retirement savings plan
• Lifestyle spending account – $500/month to use at your discretion
• Complimentary lunch and snacks for in-office employees
• One Medical membership, and more!
Hyatt
Scopic
Perform
Greenlight Planet
Get handpicked remote jobs straight to your inbox weekly.