This is a fully remote position, open to applicants in California, +4 more states.

📋 Description

• Manage Distributed Inference: Implement and set up LLM-D and vLLM on Kubernetes clusters.

• Enhance Production Efficiency: Conduct performance testing and optimize vLLM settings.

• Collaborate on Code Development: Partner with customer engineers to produce high-quality production code.

• Tackle Complex Challenges: Resolve intricate interactions between model architectures and hardware accelerators.

• Establish Feedback Mechanisms: Relay insights from the field back to product development.

⛳️ Requirements

• Over 8 Years of Engineering Experience

• Strong Customer Engagement Skills

• Proactive Approach to Problem Solving

• Extensive Knowledge of Kubernetes

• Expertise in AI Inference

• Proficient in Systems Programming with Python and Go

• Familiarity with Infrastructure as Code, including Helm, Terraform, or similar tools

• Understanding of Cloud and GPU Hardware

• Experience with open-source AI infrastructure projects is advantageous

• Familiarity with Envoy Proxy or Inference Gateway (IGW) is a bonus

🏝️ Benefits

• Comprehensive medical, dental, and vision coverage

• Flexible Spending Account for healthcare and dependent care

• Health Savings Account for high deductible medical plans

• 401(k) retirement plan with employer matching

• Paid time off and holidays

• Paid parental leave for all new parents

• Leave benefits encompassing disability, paid family medical leave, and paid military leave

• Additional perks including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, and employee assistance program

Forward Deployed Engineer, AI Inference, vLLM, Kubernetes

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Rate Analyst

HSE Manager

People Partner

B2B Outside Sales Consultant

Business Development Executive, Early Career – European Language Required

Statistical Programmer II

Never miss a great job!