
Machine Learning Engineer – Pre Training
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in United States.
• Develop scalable pre-training pipelines for foundational models, enhancing throughput and efficiency.
• Execute distributed training strategies utilizing GPUs/TPUs and high-performance clusters.
• Work alongside researchers to convert experimental configurations into production-ready workflows.
• Create monitoring and fault-tolerance systems to guarantee reliable large-scale training.
• Regularly benchmark and optimize performance across hardware and software ecosystems.
• Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related discipline—or equivalent experience.
• Over 2 years of experience with large-scale model training and distributed systems.
• Proficient coding skills in Python and familiarity with ML frameworks such as PyTorch, TensorFlow, or JAX.
• Experience in GPU scheduling, memory optimization, and parallelism techniques.
• Comfortable working in containerized and orchestrated environments (Docker/Kubernetes).
• Knowledge of high-performance computing and network bottlenecks.
• Competitive salary and comprehensive benefits package.
• Opportunities for professional development and continuous learning.
• Collaborative and innovative work environment.
• Flexible working hours and remote work options.
Onsights.io
Flock Safety
Inspiren
OneStudyTeam
Get handpicked remote jobs straight to your inbox weekly.