Remotery

Senior HPC DevOps Engineer

Posted 6 days ago

This is a fully remote position, open to applicants in Germany.

📋 Description

• Design, implement, and oversee large-scale HPC/AI clusters.

• Utilize and create tools for managing infrastructure as code.

• Develop and sustain CI/CD pipelines.

• Create automation scripts and tools to streamline deployment processes.

• Conduct thorough troubleshooting from hardware to application level.

• Act as a technical resource, sharing and developing best practices with internal teams.

• Support R&D initiatives and participate in POCs and POVs for future enhancements.


⛳️ Requirements

• B.Sc. in Computer Science, Engineering, or a related discipline with over 5 years of experience.

• In-depth knowledge of HPC and AI solution technologies, including CPUs, GPUs, high-speed interconnects, and associated software.

• Advanced skills in programming and scripting languages.

• Familiarity with tools such as Jenkins, Ansible, Puppet, and Chef.

• Strong understanding of Windows and Linux (Redhat/CentOS and Ubuntu), networking, and OS-level security.

• Comprehensive knowledge of networking protocols, including InfiniBand and Ethernet.

• Experience with job scheduling workloads and orchestration tools like Slurm and Kubernetes.

• Background in various storage solutions such as Lustre, GPFS, ZFS, and XFS.

• Proficiency with virtual systems including VMware, Hyper-V, KVM, and Citrix.

• Familiarity with cloud platforms such as AWS, Azure, and Google Cloud.


🏝️ Benefits

• NVIDIA values diversity and is dedicated to fostering an inclusive environment for all employees.

• Reasonable accommodations are provided to ensure all individuals can participate in the job application or interview process, perform essential job functions, and receive benefits.

People also viewed

Work Life Group4 min ago

Lead DevOps Engineer, Data & AI Platform

HU flagHungary OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
accesa.eu4 min ago

DevOps Engineer, German

RO flagRomania OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cisco11 min ago

Site Reliability Engineer – Kubernetes Platform

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Work Life Group18 min ago

Lead DevOps Engineer – Data & AI Platform

CZ flagCzechia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
JumpCloud18 min ago

Security Engineer, DevSecOps

MX flagMexico OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Unit418 min ago

Cloud Operations Engineer

PT flagPortugal OnlyFull-timeDevOps & Site Reliability Engineer (SRE)€30.5k – €35.1k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers