Remotery

LLM Inference Deployment Engineer

Posted 5 days ago

This is a fully remote position, open to applicants in United States.

📋 Description

• Implement and enhance large language models (LLMs) such as GPT, LLaMA, Mistral, and Falcon after training using resources like HuggingFace.

• Leverage inference runtimes including ONNX Runtime and vLLM to ensure efficient execution.

• Improve LLM scalability in real-time applications by optimizing batching, caching, and tensor parallelism.

• Create and sustain high-performance inference pipelines utilizing Docker, Kubernetes, and various inference servers.


⛳️ Requirements

• A Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related discipline.

• Proven experience in deploying LLM inference, optimizing models, and engineering runtimes.

• Strong proficiency in LLM inference frameworks such as PyTorch, ONNX Runtime, vLLM, TensorRT-LLM, and DeepSpeed.

• Comprehensive knowledge of the Python programming language for model integration and performance enhancements.

• Solid understanding of high-level model representations with experience in implementing optimizations at the framework level for Generative AI applications.

• Familiarity with containerized AI deployments using tools like Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, and TorchServe.

• Extensive knowledge of memory optimization techniques for LLMs in long-context scenarios.

• Experience with real-time LLM applications, including chatbots, code generation, and retrieval-augmented generation.


🏝️ Benefits

• Competitive salary and performance-based bonuses.

• Comprehensive health, dental, and vision insurance.

• Opportunities for professional development and continuous learning.

• Flexible work hours and remote work options.

People also viewed

Innovative Solutions2 hours ago

Cloud Engineer – DevOps

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$100k – $160k/year
ApplyView job
Caspar Health2 hours ago

DevSecOps/DevOps Engineer

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
IVIX2 hours ago

Deployment Engineer

US flagNew York OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Investigo12 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind12 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal12 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers