Remotery

Principal Software Engineer – Distributed Systems Engineer, DGX Cloud

Posted Jun 20

This is a fully remote position, open to applicants in North Carolina.

📋 Description

• Join the DGX Cloud team, where you will contribute to production systems that facilitate large, scalable GPU clusters for diverse AI workloads.

• Your role will involve developing custom software aimed at optimizing GPU resource scheduling on Kubernetes.

• You will implement monitoring and health management features to ensure exceptional reliability, availability, and scalability of GPU resources.

• The position requires you to manage multiple data streams, including GPU hardware diagnostics and cluster and network telemetry.

• Collaborate with teams across NVIDIA to guarantee that production AI clusters operate reliably and consistently at peak performance.

• Assess system failures and enhance services following a clearly defined incident management process.


⛳️ Requirements

• Proven experience in a software engineering position within a highly technical organization, demonstrating the impact of your contributions.

• Proficiency in software development using Kubernetes APIs and frameworks, rather than merely managing a cluster.

• A highly motivated individual with excellent communication skills, capable of successfully collaborating with multi-functional teams, principles, and architects while coordinating effectively across organizational boundaries and locations.

• At least 15 years of experience in a similar role, particularly with large-scale production systems.

• Familiarity with standard software engineering principles, tools, and techniques.

• A Bachelor’s degree in Computer Science, Engineering, Physics, Mathematics, or a comparable discipline, or equivalent experience.

• Technical expertise, including knowledge of a systems programming language (Go, Python) and a strong understanding of data structures and algorithms.


🏝️ Benefits

• Equity

• Benefits

People also viewed

Injective Labs11 hours ago

Protocol Engineer

US flagUnited States OnlyFull-timeBlockchain Engineer
ApplyView job
Allium13 hours ago

Blockchain Research Analyst

Anywhere in the WorldFull-timeBlockchain Engineer
ApplyView job
Lukka22 hours ago

Blockchain Data Analyst, Researcher

US flagUnited States OnlyFreelanceBlockchain Engineer
ApplyView job
decircle5 days ago

Founding Protocol Engineer

US flagCalifornia OnlyFull-timeBlockchain Engineer
ApplyView job
BPM LLP6 days ago

Senior Assurance – Blockchain, US Clients

CA flagCanada OnlyFull-timeBlockchain Engineer$90k – $105k/year
ApplyView job
BPM LLPJun 20

Senior Assurance – Blockchain

US flagUnited States OnlyFull-timeBlockchain Engineer
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers