
Senior HPC DevOps Engineer
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in Germany.
• Design, implement, and oversee large-scale HPC/AI clusters.
• Utilize and create tools for managing infrastructure as code.
• Develop and sustain CI/CD pipelines.
• Create automation scripts and tools to streamline deployment processes.
• Conduct thorough troubleshooting from hardware to application level.
• Act as a technical resource, sharing and developing best practices with internal teams.
• Support R&D initiatives and participate in POCs and POVs for future enhancements.
• B.Sc. in Computer Science, Engineering, or a related discipline with over 5 years of experience.
• In-depth knowledge of HPC and AI solution technologies, including CPUs, GPUs, high-speed interconnects, and associated software.
• Advanced skills in programming and scripting languages.
• Familiarity with tools such as Jenkins, Ansible, Puppet, and Chef.
• Strong understanding of Windows and Linux (Redhat/CentOS and Ubuntu), networking, and OS-level security.
• Comprehensive knowledge of networking protocols, including InfiniBand and Ethernet.
• Experience with job scheduling workloads and orchestration tools like Slurm and Kubernetes.
• Background in various storage solutions such as Lustre, GPFS, ZFS, and XFS.
• Proficiency with virtual systems including VMware, Hyper-V, KVM, and Citrix.
• Familiarity with cloud platforms such as AWS, Azure, and Google Cloud.
• NVIDIA values diversity and is dedicated to fostering an inclusive environment for all employees.
• Reasonable accommodations are provided to ensure all individuals can participate in the job application or interview process, perform essential job functions, and receive benefits.
Work Life Group
accesa.eu
Cisco
Work Life Group
Get handpicked remote jobs straight to your inbox weekly.