📋 Description

• Design, implement, and optimize inference pipelines for large language models and other AI workloads to achieve maximum throughput and minimal latency.

• Utilize cutting-edge optimization methods: quantization (INT4/INT8/FP8), model pruning, speculative decoding, continuous batching, and kernel fusion.

• Enhance inference-serving stacks, such as vLLM, TensorRT-LLM, ONNX Runtime, and similar frameworks, for deployment on CIQ’s OS platform.

• Profile and optimize GPU/accelerator utilization throughout the entire inference stack, including model weights, memory bandwidth, CUDA kernels, and driver overhead.

• Establish performance baselines for inference and implement regression detection across CIQ’s AI-driven solutions.

• Design and refine distributed training pipelines for large-scale models, incorporating data, model, tensor, and pipeline parallelism strategies.

• Improve training efficiency through mixed-precision training, gradient checkpointing, activation recomputation, and enhancements at the optimizer level.

• Benchmark training throughput and scaling efficiency across multi-GPU and multi-node configurations on CIQ’s infrastructure.

• Collaborate with infrastructure and performance teams to identify and resolve training bottlenecks within the network (RDMA/InfiniBand), storage, and OS layers.

• Stay informed about cutting-edge model architectures and training methodologies, including MoE models, RLHF pipelines, and new post-training techniques.

• Develop and maintain a library of ready-to-use AI workload examples that operate on CIQ’s platform, covering inference serving, fine-tuning, batch processing, RAG pipelines, and agentic workflows.

• Create both internal reference pipelines for CI/testing and customer-facing examples designed for quick productivity on CIQ’s OS and Fuzzball.

• Package workloads using containers to provide portable, reproducible AI environments across HPC and cloud-native settings.

• Develop engaging, well-documented demonstrations and reference architectures that effectively communicate CIQ’s AI capabilities to both technical and business audiences.

• Collaborate with product and customer success teams to translate practical AI use cases into reusable, production-ready examples.

• Build and maintain AI-driven engineering tools, leveraging LLM-based agents, automated analysis pipelines, and AI-assisted code generation to enhance the broader engineering organization.

• Advocate for an AI-first development culture by identifying areas where AI tools can reduce manual effort, accelerate insights, and enhance software quality across CIQ’s products.

• Assess and incorporate emerging AI frameworks, libraries, and hardware as they become relevant to CIQ’s customers and product roadmap.

• Contribute to open-source AI tools and frameworks where applicable, reinforcing CIQ’s technical reputation within the community.

• Acquire in-depth knowledge of CIQ’s Fuzzball platform, its architecture, scheduling model, and workload execution environment.

• Integrate AI training, inference, and pipeline workloads into Fuzzball-based CI/CD and production pipelines.

• Contribute to Fuzzball’s AI workload narrative, ensuring the platform serves as an optimal environment for running AI workloads efficiently and at scale.

• Assist in characterizing and enhancing Fuzzball’s performance for AI-specific access patterns and resource requirements.

• Develop a comprehensive understanding of the complete CIQ product portfolio, including Rocky Linux, RLC (and its variants), Fuzzball, Apptainer, and Warewulf, and how AI workloads interact with each component.

• Work closely with the Performance Engineering team to ensure that AI workloads benefit from and contribute to CIQ’s systems-level optimization initiatives.

⛳️ Requirements

• Extensive, hands-on experience in optimizing LLM inference, including serving frameworks (vLLM, TensorRT-LLM, ONNX Runtime), quantization techniques, and GPU memory management.

• Strong background in distributed AI training, with familiarity in frameworks such as PyTorch FSDP, DeepSpeed, Megatron-LM, or JAX/XLA.

• Proven track record in building production AI pipelines and packaging AI environments for reproducible and portable deployment (containers, Apptainer/Singularity, or equivalent).

• Proficiency with GPU/accelerator profiling tools: NVIDIA Nsight, PyTorch Profiler, CUDA performance analysis, and related tools.

• Knowledge of HPC environments, including job schedulers (Slurm, PBS), parallel filesystems, RDMA/InfiniBand, and MPI, along with the intersection of HPC and modern AI workloads.

• Experience in integrating AI workloads into CI/CD pipelines and developing automated testing and benchmarking frameworks.

• Comfortable using and developing with LLM-based tools and agentic frameworks to enhance engineering productivity.

• Exceptional analytical skills, capable of formulating hypotheses, designing experiments, and deriving actionable conclusions from complex profiling data.

• Strong written and verbal communication abilities, able to present findings to both highly technical audiences and business stakeholders.

• A collaborative, humble, and continuously learning mindset, paired with the confidence to advocate for AI engineering as a priority.

🏝️ Benefits

• Medical, dental, and vision insurance.

• Flexible paid time off.

• Employee stock options.

• Remote work; no travel required for most positions.

Senior/Principal AI Performance Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Senior Director, AI Transformation

Intern, AI and Digital Innovation

PhD Candidate – Cognitive Phenomena in Generative AI, Bridging Psychology and AI

PhD Candidate – AI-driven Human-Machine Interaction, Voice Assistants

VP, AI

IT Analyst, Artificial Intelligence (AI) Compliance

Never miss a great job!