
Senior Staff Engineer, Lustre
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in California.
β’ Deliver profound technical direction across LustreFS subsystems, which include llite, MDS/MDT, OSS/OST, LDLM, recovery, and LNet.
β’ Take ownership of intricate root-cause analysis for challenging customer, scale, and production issues spanning kernel, filesystem, network, and transport layers.
β’ Spearhead the design and execution of new features, enhancements in reliability, scalability improvements, and performance optimizations within LustreFS.
β’ Facilitate architectural evaluations for both kernel-space and user-space modifications, ensuring a strong focus on accuracy, backward compatibility, and operability.
β’ Establish debugging and observability strategies for complex distributed failure situations, including failover, recovery storms, lock contention, and transport degradation.
β’ Collaborate with principal engineers, support teams, QE, DevOps, and release teams to enhance product quality, testing depth, and release confidence.
β’ Mentor senior and mid-level engineers; develop structured learning pathways, review standards, and subsystem ownership models to foster redundancy.
β’ Advocate for the adoption of AI-assisted workflows for issue triage, log analysis, code review support, knowledge retention, and design acceleration, ensuring appropriate engineering guardrails.
β’ Over 15 years of experience in distributed systems, filesystems, Linux kernel development, or storage infrastructure engineering.
β’ In-depth hands-on knowledge of LustreFS internals and production operations, including one or more of the following: metadata services, object storage services, client/llite, locking, recovery, or LNet.
β’ Proficient C systems programming skills and extensive Linux debugging experience utilizing tools like gdb, crash, perf, ftrace, eBPF, systemtap, and core analysis.
β’ Strong comprehension of Linux kernel concurrency, memory management, I/O paths, networking, and performance optimization.
β’ Experience with high-performance networking and transport technologies such as InfiniBand, RDMA, RoCE, and/or TCP at scale.
β’ Demonstrated ability to identify complex cross-layer issues that involve kernel, storage, networking, and distributed coordination.
β’ Experience leading design conversations, code reviews, and making subsystem-level technical decisions.
β’ Exceptional written and verbal communication skills, capable of guiding senior technical audiences and influencing cross-functional teams.
β’ Health insurance
β’ 401(k) matching
β’ Flexible work hours
β’ Paid time off
β’ Remote work options
VPS
Tango
Influur
Salesloft
Get handpicked remote jobs straight to your inbox weekly.