
AI Research Engineer – Kernel, Inference Optimization
Posted Jun 4

Posted Jun 4
This is a fully remote position, open to applicants in Netherlands.
• Design and implement model serving architectures that achieve high throughput and minimal latency.
• Ensure efficient operation of pipelines across various environments, including resource-limited devices and edge platforms.
• Set clear performance benchmarks for latency and memory utilization.
• Conduct, manage, and oversee controlled inference tests.
• Monitor key performance indicators such as response latency and memory usage.
• Document iterative findings and evaluate results against established benchmarks.
• Assess computational efficiency and identify bottlenecks within the serving pipeline.
• Collaborate with cross-functional teams to integrate optimized frameworks into production systems.
• Establish success metrics aimed at enhancing performance and scalability.
• A degree in Computer Science or a related discipline.
• Preferably a PhD in NLP, Machine Learning, or a related field, along with a strong record in AI R&D (with notable publications in top-tier conferences).
• Familiarity with Metal Shading Language (MSL).
• Proficient in creating custom compute shaders from the ground up.
• Demonstrated experience in low-level kernel optimizations and inference enhancements for mobile devices.
• Contributions should have led to improvements in inference latency, throughput, and memory efficiency for specific applications.
• A comprehensive understanding of contemporary model serving architectures and inference optimization strategies.
• Strong skills in writing GPU kernels for mobile platforms.
• Hands-on experience in developing and deploying end-to-end inference pipelines.
• Ability to apply empirical research to address challenges in model serving.
• Skilled in designing robust evaluation frameworks and refining optimization methods.
• Experience with Distributed Inference Systems that utilize Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism.
• Knowledge of Pruning, Quantization, Flash attention, KV Cache, and Speculative Decoding (Eagle).
• Work remotely from any location around the globe.
• Opportunity to innovate within the fintech sector.
• Collaborate with talent from around the world.
• Competitive compensation packages available.
• Flexible working arrangements offered.
Tether.to
Tether.to
PlexTrac
Tether.to
Get handpicked remote jobs straight to your inbox weekly.