Remotery

Applied ML Engineer, Data

atCantinaEuropeFull-timeMachine Learning EngineerMid-levelSenior$200k – $260k/year

Posted May 25

This is a fully remote position, open to applicants in Europe.

📋 Description

• Develop and sustain data pipelines for extensive video generation models, encompassing data ingestion, parsing, filtering, preprocessing, and large-scale dataset curation, utilizing tools such as AWS S3 and DynamoDB.

• Create and manage annotation workflows across platforms like MTurk, Prolific, and Mechanical Turk, which includes task design, quality assurance, and label validation.

• Train, assess, and enhance smaller supporting models utilized for data filtering, quality evaluation, preprocessing, or other segments of the ML pipeline.

• Collaborate closely with research and engineering teams to convert experimental workflows into scalable, repeatable systems that facilitate model training and evaluation.

• Ensure data quality throughout the pipeline by identifying bottlenecks, failure modes, and sources of low quality, while continuously refining tools and processes.

• Develop internal tools and automation systems that simplify dataset preparation, launch annotation tasks, monitor outputs, and support comprehensive model development.

• Lead significant pipeline projects from inception to completion, such as new dataset creation initiatives or enhancements to labeling and preprocessing infrastructure.

• Operate within a Kubernetes-based training infrastructure, ensuring datasets are accurately prepared, formatted, and supplied to training clusters.

• Profile and optimize research model inference scripts used in preprocessing phases, ensuring that model-driven filtering and transformation stages adhere to practical time and cost constraints for large-scale raw data.


⛳️ Requirements

• A minimum of 3 years of experience in machine learning, applied ML, data pipelines, or related engineering positions, preferably focusing on large-scale multimodal, video, or vision-based systems.

• Proficient programming skills in Python and extensive experience in constructing reliable data processing and preprocessing pipelines for ML workflows.

• Practical experience in preparing training data for ML models, including parsing, filtering, dataset curation, quality control, and managing large-scale data using tools like AWS S3 and DynamoDB.

• Knowledge of annotation and labeling workflows, including task design, vendor or crowd-platform management such as MTurk or Prolific, and techniques to ensure label quality.

• Experience with Kubernetes for orchestrating distributed workloads, including data preprocessing, pipeline execution, and dataset delivery to training clusters.

• Familiarity with cloud and on-demand compute environments such as AWS and RunPod, with the capacity to adapt and optimize pipelines across different infrastructures.

• Experience with distributed data processing frameworks and designing systems that operate reliably at scale across multiple nodes or workers.

• Proficient in PyTorch and the broader deep learning ecosystem, capable of reading, debugging, and optimizing research model inference code for application in production preprocessing pipelines.

• Ability to work collaboratively across research and engineering teams, translating experimental concepts into effective, scalable systems.

• Bachelor's, Master's, or PhD in Computer Science, Machine Learning, Engineering, Mathematics, or a related technical field; experience in generative video, computer vision, or multimodal ML is highly desirable.

• Bonus: Experience in training, evaluating, or fine-tuning smaller ML models used for classification, filtering, ranking, quality assessment, or other supportive tasks within an ML pipeline.


🏝️ Benefits

• Competitive salary and substantial company equity

• Comprehensive medical, dental, and vision insurance – 99.99% of premiums covered by Cantina

• 42 days of paid time off, including:

• 15 PTO days

• 10 sick days

• 15 company holidays

• 2 floating holidays

• Generous parental leave and fertility support

• 401(k) retirement savings plan

• Lifestyle spending account – $500/month to use at your discretion

• Complimentary lunch and snacks for in-office employees

• One Medical membership, and more!

People also viewed

Hyatt1 day ago

Senior Machine Learning Engineer

MX flagMexico OnlyFull-timeMachine Learning Engineer
ApplyView job
Scopic1 day ago

Machine Learning Engineer

Anywhere in the WorldFull-timeMachine Learning Engineer
ApplyView job
Perform1 day ago

Senior AI/ML Engineer

Anywhere in the WorldFull-timeMachine Learning Engineer
ApplyView job
Greenlight Planet3 days ago

Machine Learning Engineer

IN flagIndia OnlyFull-timeMachine Learning Engineer
ApplyView job
Gympass6 days ago

Senior MLOps Engineer

BR flagBrazil OnlyFull-timeMachine Learning Engineer
ApplyView job
IDT BY INDET GROUP6 days ago

Senior Data/ML Engineer

BR flagBrazil OnlyFull-timeMachine Learning Engineer
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers