This is a fully remote position, open to applicants in United States.

📋 Description

• Design, develop, and sustain extensive benchmarking frameworks that encompass OS, kernel, and application layers.

• Analyze workloads across CPU, memory, I/O, network, and accelerator (GPU/NPU) subsystems to pinpoint bottlenecks and areas for optimization.

• Establish and take ownership of performance baselines throughout CIQ's product and solutions portfolio.

• Utilize AI-assisted tools and agentic workflows to expedite profiling, analysis, and identification of root causes.

• Create and manage automated performance regression-detection pipelines that are integrated into CI/CD workflows using Fuzzball.

• Identify, triage, and resolve regressions in user space, kernel space, and application layers with a sense of urgency and thoroughness.

• Collaborate with engineering teams to trace regressions caused by upstream kernel changes, compiler updates, or library modifications.

• Proactively drive performance enhancements—focusing on advancements rather than just reactive solutions—to maintain CIQ's competitive edge across all stack layers.

• Oversee core operating system performance, including kernel subsystem tuning (scheduler, memory management, I/O, networking), system call overhead reduction, and optimizations for user space libraries and runtimes.

• Identify and apply kernel-level enhancements, such as patches, configuration changes, and upstream contributions that yield measurable performance improvements for CIQ's customer workloads.

• Optimize workloads for AI inference and training, including LLM serving, model parallelism, and accelerator utilization.

• Fine-tune performance for HPC workloads, including modeling, simulation, and tightly coupled parallel applications (MPI, OpenMP, etc.).

• Enhance general computing and service workloads—including web services, databases, messaging systems, and other production software on CIQ's OS platform.

• Operate at all stack levels: adjusting compiler flags, kernel parameters, scheduler settings, NUMA topology, memory allocation, and application-level algorithmic improvements.

• Advocate for an AI-first engineering philosophy—utilizing AI tools, agents, and automation to enhance both personal productivity and the quality of performance insights.

• Identify and prioritize optimization opportunities that significantly affect AI training throughput and inference latency/cost.

• Remain up to date on cutting-edge techniques in ML system performance, including quantization, batching strategies, kernel fusion, and hardware-software co-design.

• Develop in-depth expertise in CIQ's Fuzzball platform, focusing on its architecture, scheduling, and workload execution model.

• Integrate performance benchmarks, regression tests, and user-facing workloads into Fuzzball-based pipelines.

• Contribute to the performance characterization of Fuzzball, ensuring minimal overhead and efficient scaling of the platform.

• Gain comprehensive familiarity with CIQ's entire product portfolio—including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer (formerly Singularity), and Warewulf—understanding how performance factors interconnect across each.

• Collaborate extensively with engineering teams behind each product line to highlight, prioritize, and implement performance improvements that benefit customers throughout the CIQ ecosystem.

• Partner with product and customer success teams to translate real-world performance challenges into engineering priorities and measurable outcomes.

• Clearly document and communicate findings—from low-level profiling data to executive-level summaries.

• Contribute to technical publications, conference presentations, and thought leadership that reinforces CIQ's commitment to performance excellence.

⛳️ Requirements

• Profound and principled understanding of operating system internals, including the Linux kernel scheduler, memory subsystem, I/O stack, and networking.

• Proven track record in identifying and resolving performance regressions in both kernel and user space within production settings.

• Practical expertise with profiling and tracing tools such as perf, eBPF/bpftrace, Flamegraphs, VTune, Nsight, strace, ftrace, and others.

• Strong background in AI/ML workload performance, encompassing inference optimization (TensorRT, ONNX, vLLM, or similar), training efficiency, and GPU/accelerator utilization.

• Experience with HPC workloads, including MPI, OpenMP, parallel filesystems, RDMA/InfiniBand, and job schedulers (Slurm, PBS, etc.).

• Familiarity with modern AI-first development workflows and comfort in using LLM-based tools to accelerate engineering tasks.

• Experience in constructing automated performance testing and regression detection pipelines within CI/CD frameworks.

• Exceptional analytical abilities—capable of forming hypotheses, designing experiments, and deriving actionable insights from complex data.

• Strong written and verbal communication skills; adept at presenting findings to both technical audiences and business stakeholders.

• A collaborative, humble, and continuously learning mindset, paired with the confidence to advocate for performance as a primary engineering concern.

🏝️ Benefits

• Medical, dental, and vision insurance.

• Flexible paid time off.

• Employee stock options.

• Remote work; no travel required for most positions.

Senior/Principal Performance Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Engineer

Field Services Engineer

Intermediate MPE Engineer

Arc Flash Engineer

Senior Software Engineer

Lead Controls Engineer – Crossbelt

Never miss a great job!