Remotery

Senior/Principal Performance Engineer

Posted Jun 19

This is a fully remote position, open to applicants in United States.

đź“‹ Description

• Design, develop, and sustain extensive benchmarking frameworks that encompass OS, kernel, and application layers.

• Analyze workloads across CPU, memory, I/O, network, and accelerator (GPU/NPU) subsystems to pinpoint bottlenecks and areas for optimization.

• Establish and take ownership of performance baselines throughout CIQ's product and solutions portfolio.

• Utilize AI-assisted tools and agentic workflows to expedite profiling, analysis, and identification of root causes.

• Create and manage automated performance regression-detection pipelines that are integrated into CI/CD workflows using Fuzzball.

• Identify, triage, and resolve regressions in user space, kernel space, and application layers with a sense of urgency and thoroughness.

• Collaborate with engineering teams to trace regressions caused by upstream kernel changes, compiler updates, or library modifications.

• Proactively drive performance enhancements—focusing on advancements rather than just reactive solutions—to maintain CIQ's competitive edge across all stack layers.

• Oversee core operating system performance, including kernel subsystem tuning (scheduler, memory management, I/O, networking), system call overhead reduction, and optimizations for user space libraries and runtimes.

• Identify and apply kernel-level enhancements, such as patches, configuration changes, and upstream contributions that yield measurable performance improvements for CIQ's customer workloads.

• Optimize workloads for AI inference and training, including LLM serving, model parallelism, and accelerator utilization.

• Fine-tune performance for HPC workloads, including modeling, simulation, and tightly coupled parallel applications (MPI, OpenMP, etc.).

• Enhance general computing and service workloads—including web services, databases, messaging systems, and other production software on CIQ's OS platform.

• Operate at all stack levels: adjusting compiler flags, kernel parameters, scheduler settings, NUMA topology, memory allocation, and application-level algorithmic improvements.

• Advocate for an AI-first engineering philosophy—utilizing AI tools, agents, and automation to enhance both personal productivity and the quality of performance insights.

• Identify and prioritize optimization opportunities that significantly affect AI training throughput and inference latency/cost.

• Remain up to date on cutting-edge techniques in ML system performance, including quantization, batching strategies, kernel fusion, and hardware-software co-design.

• Develop in-depth expertise in CIQ's Fuzzball platform, focusing on its architecture, scheduling, and workload execution model.

• Integrate performance benchmarks, regression tests, and user-facing workloads into Fuzzball-based pipelines.

• Contribute to the performance characterization of Fuzzball, ensuring minimal overhead and efficient scaling of the platform.

• Gain comprehensive familiarity with CIQ's entire product portfolio—including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer (formerly Singularity), and Warewulf—understanding how performance factors interconnect across each.

• Collaborate extensively with engineering teams behind each product line to highlight, prioritize, and implement performance improvements that benefit customers throughout the CIQ ecosystem.

• Partner with product and customer success teams to translate real-world performance challenges into engineering priorities and measurable outcomes.

• Clearly document and communicate findings—from low-level profiling data to executive-level summaries.

• Contribute to technical publications, conference presentations, and thought leadership that reinforces CIQ's commitment to performance excellence.


⛳️ Requirements

• Profound and principled understanding of operating system internals, including the Linux kernel scheduler, memory subsystem, I/O stack, and networking.

• Proven track record in identifying and resolving performance regressions in both kernel and user space within production settings.

• Practical expertise with profiling and tracing tools such as perf, eBPF/bpftrace, Flamegraphs, VTune, Nsight, strace, ftrace, and others.

• Strong background in AI/ML workload performance, encompassing inference optimization (TensorRT, ONNX, vLLM, or similar), training efficiency, and GPU/accelerator utilization.

• Experience with HPC workloads, including MPI, OpenMP, parallel filesystems, RDMA/InfiniBand, and job schedulers (Slurm, PBS, etc.).

• Familiarity with modern AI-first development workflows and comfort in using LLM-based tools to accelerate engineering tasks.

• Experience in constructing automated performance testing and regression detection pipelines within CI/CD frameworks.

• Exceptional analytical abilities—capable of forming hypotheses, designing experiments, and deriving actionable insights from complex data.

• Strong written and verbal communication skills; adept at presenting findings to both technical audiences and business stakeholders.

• A collaborative, humble, and continuously learning mindset, paired with the confidence to advocate for performance as a primary engineering concern.


🏝️ Benefits

• Medical, dental, and vision insurance.

• Flexible paid time off.

• Employee stock options.

• Remote work; no travel required for most positions.

People also viewed

Capco43 min ago

Engineer

BR flagBrazil OnlyFull-timeEngineer
ApplyView job
Greencells Group43 min ago

Field Services Engineer

GB flagUnited Kingdom OnlyFull-timeEngineer
ApplyView job
Teamficient1 hour ago

Intermediate MPE Engineer

US flagUnited States OnlyFull-timeEngineer$800 – $1,200/month
ApplyView job
ESCO Group1 hour ago

Arc Flash Engineer

US flagIowa OnlyFull-timeEngineer$105k – $125k/year
ApplyView job
RSA1 hour ago

Senior Software Engineer

US flagUnited States OnlyFull-timeEngineer
ApplyView job
FORTNA1 hour ago

Lead Controls Engineer – Crossbelt

US flagUnited States OnlyFull-timeEngineer$100.7k – $151.1k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers