This is a fully remote position, open to applicants in United States.

📋 Description

• Lead the technical discovery process with clients spanning infrastructure, platform, machine learning, data, and executive stakeholders.

• Create architectural designs for extensive AI, high-performance computing (HPC), analytics, and enterprise data workloads.

• Assist clients in assessing infrastructure that includes GPUs, storage solutions, networking, orchestration, and data movement.

• Design and implement proofs of concept that confirm performance, scalability, reliability, and business value.

• Convert intricate technical requirements into straightforward solution designs, reference architectures, and deployment instructions.

• Troubleshoot customer issues across Linux, storage, networking, Kubernetes, schedulers, GPUs, and application workloads.

• Develop technical resources, demonstrations, runbooks, and field guidance for consistent customer engagements.

• Collaborate with sales on technical strategy, competitive positioning, and deal execution.

• Work alongside product and engineering teams to relay customer requirements, identify gaps, and highlight roadmap opportunities.

• Assist clients in transitioning from architecture design to production deployment.

⛳️ Requirements

• 8 to 12+ years of technical experience, with a strong emphasis on hands-on infrastructure expertise.

• Experience in building, operating, or architecting production platform infrastructure.

• Thorough understanding of Linux kernel implementation details, distributed systems including PAXOS and Raft, storage implementation details such as NAND or write amplification, networking store/forward, load balancing designs, and production operations.

• Experience with one or more of the following: GPU infrastructure, large-scale HPC systems, Kubernetes platforms built from the ground up, MLOps, storage systems, cloud infrastructure, data platforms, or large-scale enterprise infrastructure.

• Ability to communicate effectively with engineers, architects, technical executives, and business stakeholders.

• Strong skills in discovery, problem-solving, and systems debugging.

• Comfort in navigating ambiguous, fast-paced environments.

• Genuine interest in customer-facing technical roles, solution design, and achieving business outcomes.

• Experience with large-scale GPU clusters, distributed training, inference infrastructure, or AI platforms.

• Knowledge of petabyte-scale storage or high-performance data systems.

• Familiarity with Kubernetes, Slurm, Ray, Spark, or other orchestration/scheduling systems.

• Domain expertise in one or more of the following: Lustre, Ceph, Weka, BeeGFS, GPFS, VAST, object storage, or distributed filesystems.

• Experience with InfiniBand, RoCE, RDMA, high-performance Ethernet, or NVIDIA/Mellanox networking.

• Direct experience with CUDA, NCCL, DCGM, GPUDirect, checkpointing, dataset staging, or model-serving infrastructure.

• Experience across various industries or customer environments.

🏝️ Benefits

• Competitive compensation package.

• Opportunities for professional growth and development.

• Collaborative and innovative work environment.

• Flexible working hours and remote work options.

Senior Solutions Engineer, AI Infrastructure

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Senior Manager, Solutions Engineering

Proof Systems Integration Engineer

Healthcare Data Integration Engineer – FHIR, OMOP

Solution Engineer, IAM, IGA, SME

Business Solutions Architect

Solutions Consultant

Never miss a great job!