
Staff ML Infrastructure Engineer
Posted Jun 27

Posted Jun 27
This is a fully remote position, open to applicants in California.
• Oversee the design, implementation, and deployment of scalable platforms and tools that facilitate machine learning model training and evaluation workflows throughout GM.
• Manage complex technical projects from start to finish, making critical architectural decisions and technical trade-offs.
• Serve as a key contributor in team planning, design evaluations, and ensuring code quality.
• Adopt a comprehensive perspective on projects, assessing their impact across various teams and over extended timelines.
• Actively promote technical prioritization.
• Work collaboratively with partner teams to maximize the benefits of the systems we develop.
• Contribute to shaping our team through technical interviews, maintaining high and well-calibrated standards, and play a pivotal role in recruitment efforts.
• Guide and onboard junior engineers and interns, assisting them in their career development.
• 5+ years of experience in developing large-scale distributed systems, applications, or advanced machine learning systems.
• Demonstrated success in designing robust frameworks with high-quality, resilient APIs.
• Profound understanding of machine learning algorithms with practical application experience.
• Expertise in creating reliable, high-performance, and cost-effective systems on contemporary cloud infrastructure.
• Comprehensive experience throughout the ML development lifecycle, including MLOps practices.
• Strong collaboration skills across teams and organizations.
• Exceptional coding proficiency in Python or C++.
• Keen interest in autonomous driving and its transformative capabilities.
• Bachelor’s, Master’s, or PhD in Computer Science, Mathematics, or equivalent practical experience.
• Preferred: Experience with distributed training methodologies.
• Experience in scaling ML training across extensive GPU/CPU clusters or other accelerators.
• Familiarity with deep learning frameworks (e.g., PyTorch, TensorFlow).
• Experience in performance profiling and state-of-the-art training optimization techniques, as well as their impact on model performance.
• Experience with advanced build systems (e.g., Bazel, Buck, Blaze, CMake).
• Proficient in containerization and orchestration technologies (e.g., Docker, Kubernetes).
• Medical
• Dental
• Vision
• Health Savings Account
• Flexible Spending Accounts
• Retirement savings plan
• Sickness and accident benefits
• Life insurance
• Paid vacation & holidays
• Tuition assistance programs
• Employee assistance program
• GM vehicle discounts and more.
Nitka
By Light Professional IT Services
Binance
F5
Get handpicked remote jobs straight to your inbox weekly.