
Senior/Principal Performance Engineer
Posted May 6

Posted May 6
• Design, develop, and sustain extensive benchmarking frameworks that encompass OS, kernel, and application layers.
• Profile workloads across CPU, memory, I/O, network, and accelerator (GPU/NPU) subsystems to uncover bottlenecks and optimization possibilities.
• Establish and take ownership of performance baselines across CIQ's suite of products and solutions.
• Utilize AI-assisted tools and agentic workflows to expedite profiling, analysis, and identification of root causes.
• Construct and manage automated performance regression-detection pipelines that are integrated into CI/CD workflows using Fuzzball.
• Identify, triage, and address regressions in user space, kernel space, and application layers with promptness and diligence.
• Collaborate with engineering teams to determine the root causes of regressions stemming from upstream kernel alterations, compiler updates, or library changes.
• Promote proactive performance enhancements—rather than just reactive fixes—to maintain CIQ solutions' competitive edge across every stack layer.
• Oversee core operating system performance: tuning kernel subsystems (scheduler, memory management, I/O, networking), reducing system call overhead, and optimizing user space libraries and runtimes.
• Identify and enact kernel-level improvements, including patches, configuration adjustments, and contributions to upstream that yield significant performance gains for CIQ's customer workloads.
• Optimize AI inference and training workloads, including LLM serving, model parallelism, and accelerator utilization.
• Fine-tune performance for HPC workloads, including modeling, simulation, and tightly integrated parallel applications (MPI, OpenMP, etc.).
• Enhance general computing and service workloads - covering web services, databases, messaging systems, and other production software operating on CIQ's OS platform.
• Operate at all levels of the stack: compiler flags, kernel parameters, scheduler tuning, NUMA topology, memory allocation, and application-level algorithmic enhancements.
• Advocate for an AI-first engineering philosophy—leveraging AI tools, agents, and automation to boost personal productivity and the quality of performance insights.
• Identify and prioritize optimization avenues that have a direct impact on AI training throughput and inference latency/cost.
• Stay up-to-date with cutting-edge techniques in ML system performance, including quantization, batching strategies, kernel fusion, and hardware-software co-design.
• Cultivate in-depth knowledge of CIQ's Fuzzball platform—its architecture, scheduling, and workload execution model.
• Integrate performance benchmarks, regression tests, and user-facing workloads into Fuzzball-based pipelines.
• Contribute to the performance characterization of Fuzzball itself, ensuring minimal overhead and efficient scalability of the platform.
• Develop a comprehensive understanding of the full CIQ product portfolio—including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer (formerly Singularity), and Warewulf—while recognizing how performance considerations interconnect across all.
• Collaborate closely with the engineering teams behind each product line to identify, prioritize, and implement performance enhancements that benefit customers throughout the entire CIQ ecosystem.
• Partner with product and customer success teams to translate real-world performance challenges into engineering priorities and measurable results.
• Document and communicate findings effectively—from low-level profiling data to high-level executive summaries.
• Contribute to technical publications, conference presentations, and thought leadership that bolsters CIQ's reputation for performance excellence.
• A profound, principled understanding of operating system internals—Linux kernel scheduler, memory subsystem, I/O stack, and networking.
• Proven experience in identifying and resolving performance regressions across kernel and user space within production environments.
• Hands-on proficiency with profiling and tracing tools: perf, eBPF/bpftrace, Flamegraphs, VTune, Nsight, strace, ftrace, and similar tools.
• Strong foundation in AI/ML workload performance—including inference optimization (TensorRT, ONNX, vLLM, or similar), training efficiency, and GPU/accelerator utilization.
• Experience with HPC workloads: MPI, OpenMP, parallel filesystems, RDMA/InfiniBand, and job schedulers (Slurm, PBS, etc.).
• Familiarity with modern AI-centric development workflows and comfort in using LLM-based tools to enhance engineering productivity.
• Experience in developing automated performance testing and regression detection pipelines within CI/CD settings.
• Exceptional analytical abilities—capable of forming hypotheses, designing experiments, and deriving actionable conclusions from complex datasets.
• Strong verbal and written communication skills; adept at presenting findings to both highly technical audiences and business stakeholders.
• A collaborative, humble, and continuously learning mindset—paired with the confidence to advocate for performance as a paramount engineering concern.
• Medical, dental, and vision insurance.
• Flexible paid time off.
• Employee stock options.
• Remote work; no travel required for most positions.
Symbotic
Honeywell
Honeywell
Get handpicked remote jobs straight to your inbox weekly.