This is a fully remote position, open to applicants in United States.

📋 Description

• Design, develop, and manage enterprise AI systems within our client portfolio.

• Oversee the complete AI stack, from inference engines and platform infrastructure to application-level engineering.

• Lead the comprehensive design, development, and management of AI systems on AI Factory platforms across various client initiatives.

• Engineer and optimize LLM inference serving stacks, primarily focusing on vLLM while also covering the broader inference ecosystem to meet client latency, throughput, and cost objectives.

• Enhance inference performance through KV cache management, paged attention, batching strategies, and Dynamo-based disaggregated serving.

• Design and manage MLOps pipelines encompassing model lifecycle, registries, deployment, rollback, and observability.

• Create and engineer RAG applications utilizing vector databases.

• Develop and optimize prompt-engineering patterns at a production scale.

• Engineer high-performance storage and networking solutions for AI workloads.

• Manage Kubernetes clusters that support AI workloads.

• Build and maintain container images, registries, and CI/CD pipelines for AI/ML services.

• Implement monitoring, alerting, logging, and capacity planning across the AI stack.

• Secure environments to comply with client security and regulatory requirements.

• Lead troubleshooting efforts across diverse environments and technologies.

• Directly engage with client stakeholders—both technical and executive—to communicate updates, root causes, options, and recommendations.

• Mentor and review code from junior engineers, elevating the technical standards of every engagement you participate in.

• Author runbooks, reference architectures, and knowledge base content; lead client knowledge transfer and enablement sessions.

• Participate in on-call rotations and incident response for production AI workloads.

• Contribute reusable patterns, tools, and reference designs back to the practice.

⛳️ Requirements

• Over 7 years of experience in software, data, or infrastructure engineering, with at least 3 years specifically focused on modern AI / LLM systems.

• Proficient in production-quality Python at an engineering level—capable of testing, code review, version control, and delivering code that others rely on.

• Extensive production experience with Linux, including system internals, performance tuning, and troubleshooting.

• Advanced expertise in Docker—covering image building, registry management, runtime tuning, and container security.

• Strong server-platform knowledge including CPU/GPU architectures, PCIe, BMC management, BIOS/firmware lifecycle, and physical-to-logical troubleshooting.

• Hands-on experience with deploying and managing one or more of HPE PCAI, Dell AI Factory, or Nutanix Enterprise AI.

• Practical experience in deploying, tuning, and operating vLLM.

• Familiarity with multiple inference and model-serving frameworks beyond vLLM, with the capability to select and optimize the appropriate tool for each workload.

• Hands-on experience with high-throughput, low-latency storage and network infrastructures for AI workloads—including RDMA-class interconnects, parallel/object storage tiers, KV cache management, and Dynamo-style disaggregated serving.

• Practical experience in operating MLOps tools and methodologies—model registries, deployment pipelines, GitOps, lineage, and rollback.

• Experience in deploying, tuning, and integrating vector databases and RAG pipelines, alongside the application-level engineering that supports them.

• Proven experience designing system prompts, structured outputs, function calling, and tool-using LLM patterns.

• Demonstrated ability to create LLM evaluation frameworks—golden sets, regression suites, and quality/cost metrics.

• Proven ability to engage directly with client stakeholders—facilitating working sessions, presenting recommendations, and translating technical details for non-technical audiences.

• Excellent written and verbal communication skills—creating clear reference architectures, runbooks, and incident reports.

• A history of mentoring junior engineers and enhancing team technical quality through code reviews and collaborative work.

• Knowledge of TCP/IP, DNS, load balancing, VLANs, and firewall management.

• Comfortable working across multiple concurrent client environments and managing competing priorities under SLA.

🏝️ Benefits

• Competitive salary and performance-based bonuses.

• Comprehensive health, dental, and vision insurance.

• Generous paid time off and flexible working arrangements.

• Opportunities for professional development and continuous learning.

• Collaborative and innovative work environment.

Senior AI Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Rate Analyst

HSE Manager

People Partner

B2B Outside Sales Consultant

Business Development Executive, Early Career – European Language Required

Statistical Programmer II

Never miss a great job!