
Forward Deployment Engineer, Generative AI
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in United States.
• The Forward Deployment Engineer (FDE) is responsible for the on-site deployment, integration, and scaling of our enterprise Generative AI solutions.
• This position involves direct collaboration with customer engineering teams to operationalize Large Language Models (LLMs) and retrieval systems across various multi-cloud platforms (AWS, Azure, GCP).
• You will serve as a bridge between AI research and production-grade cloud infrastructure.
• Collaboration with cross-functional teams and business partners will be essential, as you leverage your analytical skills to drive both current and future strategies, ensuring business value and effectively communicating the results.
• AI Solution Deployment: Expertise in deploying, fine-tuning, and optimizing large-scale Gen AI models and LLM orchestration frameworks within customer cloud environments.
• Infrastructure Engineering: Ability to design scalable infrastructure for AI workloads by utilizing GPU/TPU orchestration, high-performance storage, and low-latency networking.
• Data & Retrieval Pipelines: Proficiency in designing and implementing high-throughput data ingestion pipelines and Vector Database architectures for Retrieval-Augmented Generation (RAG).
• Multi-Cloud Management: Skills in creating agnostic, resilient cloud deployments across AWS, Azure, and GCP using Infrastructure as Code (IaC).
• Technical Advocacy: Serve as the primary technical consultant, assisting enterprise clients with AI safety, prompt engineering patterns, and inference cost optimization.
• Product Collaboration: Provide deployment insights on edge cases back to core AI research and platform engineering teams to enhance product robustness.
• Technical Requirements- AI Frameworks: Hands-on experience with LLM orchestration tools (LangChain, LlamaIndex, AutoGen) and deep learning frameworks (PyTorch, Hugging Face).
• Vector Databases: Proven experience in setting up and querying vector stores (Milvus, Pinecone, Qdrant, Chroma, or pgvector).
• Model Operations (LLMOps): Proficient in model serving frameworks (vLLM, TGI, Triton Inference Server) and evaluation tools.
• Cloud & Containers: Advanced knowledge of cloud AI primitives (AWS Bedrock/SageMaker, Azure OpenAI, GCP Vertex AI) and Kubernetes (K8s) for GPU workloads.
• IaC & Automation: Mastery of Terraform or OpenTofu for provisioning complex multi-cloud compute environments.
• Programming: Strong coding skills in Python (preferred) or Go, with a focus on writing clean, concurrent code.
• Soft Skills- AI Consultation: Capability to manage customer expectations regarding LLM non-determinism, hallucinations, and performance trade-offs.
• Rapid Adaptability: Enthusiasm for staying current with the latest developments in the Generative AI field.
• Critical Debugging: Exceptional ability to identify errors across complex software layers, from GPU drivers to prompt engineering logic.
• Mobility: Willingness to travel to client sites to lead high-stakes, on-site deployment sprints.
• This role presents an outstanding opportunity for significant career growth in a dynamic and challenging entrepreneurial environment, characterized by a high level of individual responsibility.
• Tiger Analytics is committed to providing equal employment opportunities to all applicants and employees, without discrimination based on race, color, religion, age, sex, sexual orientation, gender identity/expression, pregnancy, national origin, ancestry, marital status, protected veteran status, disability status, or any other basis protected by federal, state, or local law.
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.