
Senior Solutions Architect – Infiniband, Networking, Ethernet
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in India.
• The main responsibilities will involve constructing AI/HPC infrastructure for both new and existing clients.
• Assist in the operational and reliability aspects of large-scale AI clusters, emphasizing performance at scale, real-time monitoring, logging, and alerting.
• Participate in and enhance the entire service lifecycle—from concept and design through to deployment, operation, and continuous improvement.
• Manage services post-launch by tracking and assessing availability, latency, and overall system health.
• Offer insights to internal teams, such as reporting bugs, documenting workarounds, and recommending enhancements.
• BS/MS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related disciplines.
• A minimum of 5+ years of professional experience in networking fundamentals, including Ethernet or InfiniBand.
• Practical experience with network switch/router platforms such as Cumulus Linux, SONiC, IOS, JunosOS, and EOS, among others.
• Solid understanding of core principles related to Ethernet, InfiniBand, and RDMA.
• Proficient in end-to-end IB/Eth cluster deployment, adapter configuration, and firmware maintenance, with the ability to perform professional performance benchmarking using mainstream RDMA testing tools.
• Capable of independently diagnosing and resolving common IB/Eth network issues, including link flapping, connection failures, and bandwidth and latency jitter problems.
• Mastery of practical RDMA network optimization techniques such as QP tuning, MTU configuration, and congestion control optimization.
• Hands-on experience in RDMA-accelerated business scenarios, including distributed storage and high-performance computing clusters.
• Extensive experience in delivering automated network provisioning solutions using tools like Ansible, Salt, and Python.
• Capability to develop CI/CD pipelines for network operations.
• Strong written, verbal, and listening skills in English are essential.
• NVIDIA has been at the forefront of accelerated computing.
• Our AI infrastructure drives global intelligence, revolutionizing every industry.
NVIDIA
Towa Software
AIM Qualifications and Assessment Group
Get handpicked remote jobs straight to your inbox weekly.