
Machine Learning Systems Engineer
Posted Jun 21

Posted Jun 21
This is a fully remote position, open to applicants in United States.
• Performance Profiling & Optimization: Leverage profiling tools (such as Nsight and PyTorch Profiler) to pinpoint bottlenecks in data loading, gradient computation, and communication.
• Apply optimizations like kernel fusion, sharding, and tiling to enhance step time.
• Refine distributed training pipelines utilizing frameworks like PyTorch Distributed.
• Design and sustain high-performance GPU kernels in Triton or CUDA for cutting-edge ML workloads.
• Enhance robust data loading pipelines to maximize training throughput.
• Bachelor’s, Master’s degree, or PhD in Computer Science, Computer Engineering, or a related technical field.
• Strong command of Python.
• Significant hands-on experience with PyTorch.
• Experience in optimizing machine learning model execution during both training and inference.
• Solid understanding of core machine learning concepts, architectures, and processes.
• Outstanding analytical and problem-solving abilities, with a proactive mindset and a data-driven approach.
• Medical
• Dental
• Vision
• 401k with a company match
• Health saving accounts
• Life insurance
• Pet insurance
• Additional perks
Jellyfish
ScalableOS
Pragmatike
Get handpicked remote jobs straight to your inbox weekly.