
Senior Solutions Engineer, AI Infrastructure
Posted Jun 19

Posted Jun 19
This is a fully remote position, open to applicants in United States.
• Lead the technical discovery process with clients spanning infrastructure, platform, machine learning, data, and executive stakeholders.
• Create architectural designs for extensive AI, high-performance computing (HPC), analytics, and enterprise data workloads.
• Assist clients in assessing infrastructure that includes GPUs, storage solutions, networking, orchestration, and data movement.
• Design and implement proofs of concept that confirm performance, scalability, reliability, and business value.
• Convert intricate technical requirements into straightforward solution designs, reference architectures, and deployment instructions.
• Troubleshoot customer issues across Linux, storage, networking, Kubernetes, schedulers, GPUs, and application workloads.
• Develop technical resources, demonstrations, runbooks, and field guidance for consistent customer engagements.
• Collaborate with sales on technical strategy, competitive positioning, and deal execution.
• Work alongside product and engineering teams to relay customer requirements, identify gaps, and highlight roadmap opportunities.
• Assist clients in transitioning from architecture design to production deployment.
• 8 to 12+ years of technical experience, with a strong emphasis on hands-on infrastructure expertise.
• Experience in building, operating, or architecting production platform infrastructure.
• Thorough understanding of Linux kernel implementation details, distributed systems including PAXOS and Raft, storage implementation details such as NAND or write amplification, networking store/forward, load balancing designs, and production operations.
• Experience with one or more of the following: GPU infrastructure, large-scale HPC systems, Kubernetes platforms built from the ground up, MLOps, storage systems, cloud infrastructure, data platforms, or large-scale enterprise infrastructure.
• Ability to communicate effectively with engineers, architects, technical executives, and business stakeholders.
• Strong skills in discovery, problem-solving, and systems debugging.
• Comfort in navigating ambiguous, fast-paced environments.
• Genuine interest in customer-facing technical roles, solution design, and achieving business outcomes.
• Experience with large-scale GPU clusters, distributed training, inference infrastructure, or AI platforms.
• Knowledge of petabyte-scale storage or high-performance data systems.
• Familiarity with Kubernetes, Slurm, Ray, Spark, or other orchestration/scheduling systems.
• Domain expertise in one or more of the following: Lustre, Ceph, Weka, BeeGFS, GPFS, VAST, object storage, or distributed filesystems.
• Experience with InfiniBand, RoCE, RDMA, high-performance Ethernet, or NVIDIA/Mellanox networking.
• Direct experience with CUDA, NCCL, DCGM, GPUDirect, checkpointing, dataset staging, or model-serving infrastructure.
• Experience across various industries or customer environments.
• Competitive compensation package.
• Opportunities for professional growth and development.
• Collaborative and innovative work environment.
• Flexible working hours and remote work options.
Posit PBC
decircle
John Snow Labs
Palo Alto Networks
Get handpicked remote jobs straight to your inbox weekly.