Remotery

Principal Software Engineer – Distributed Systems Engineer, DGX Cloud

Posted May 14

This is a fully remote position, open to applicants in North Carolina.

📋 Description

• You will join the DGX Cloud team, which is accountable for production systems that facilitate extensive scalable GPU clusters for various AI workloads.

• This role involves working on specialized software related to the scheduling of GPU resources on Kubernetes.

• You will be implementing monitoring and health management features that ensure industry-leading reliability, availability, and scalability of GPU assets.

• Your responsibilities will include managing multiple data streams, encompassing GPU hardware diagnostics as well as cluster and network telemetry.

• Collaboration with teams across NVIDIA will be essential to ensure that production AI clusters operate reliably and consistently at peak performance.

• You will assess system failures and enhance services based on a well-defined incident management procedure.


⛳️ Requirements

• Proven experience in a software engineering role within a highly technical environment, demonstrating significant impact from your contributions.

• Software development expertise with Kubernetes APIs and frameworks, going beyond merely operating a cluster.

• Highly driven with excellent communication skills, capable of working effectively with multi-functional teams, principles, and architects, while coordinating across organizational boundaries and geographies.

• Over 15 years of experience in a similar position, particularly with large-scale production systems.

• Familiarity with standard software engineering principles, tools, and techniques.

• A BS in Computer Science, Engineering, Physics, Mathematics, or a comparable degree, or equivalent experience.

• Technical proficiency, including knowledge of a systems programming language (Go, Python) and a solid grasp of data structures and algorithms.


🏝️ Benefits

• Equity

• Benefits

People also viewed

BTSE4 days ago

Senior Frontend Developer – web3, blockchain, Mandarin Speaker

SG flagSingapore OnlyFull-timeBlockchain Engineer
ApplyView job
AlliumJun 25

Blockchain Research Analyst

Anywhere in the WorldFull-timeBlockchain Engineer
ApplyView job
ThisWay GlobalJun 13

Distributed Systems Engineer

US flagUnited States OnlyFull-timeBlockchain Engineer
ApplyView job
GalaxyJun 12

Blockchain Security Lead

US flagUnited States OnlyFull-timeBlockchain Engineer
ApplyView job
Ant-TechJun 11

Sales Director – APAC, Blockchain, Enterprise Fintech

KR flagSouth Korea OnlyFull-timeBlockchain Engineer$170k – $200k/year
ApplyView job
BPM LLPJun 8

Senior Assurance – Blockchain, US Clients

CA flagCanada OnlyFull-timeBlockchain Engineer$90k – $105k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers