Remotery

Senior Distinguished Engineer, AI Compute

Posted 11 hours ago

This is a fully remote position, open to applicants in California, +3 more states.

📋 Description

• Design and construct control and data plane implementations necessary for creating a highly available, multi-tenant, large-scale, and secure machine learning platform.

• Create solutions using Ray and Spark distributed compute engines to enhance various workloads, including LLM pre-training, reinforcement learning, and large-scale data processing, while maximizing compute unit economics.

• Implement systemic enhancements for operational excellence, such as automating KTLO (Keep The Lights On) workflows.

• Oversee the technical execution of a diverse project portfolio, working alongside developers who specialize in areas ranging from distributed microservices to large foundation models.

• Collaborate cross-functionally with product and program management teams, as well as stakeholders and partners across Capital One, to optimize business results while driving robust technology solutions.

• Share your enthusiasm for keeping up with tech trends, experimenting with and learning new technologies, and participating in both internal and external technology communities, as well as leading system design and code review sessions.

• Contribute to enhancing the Capital One Distinguished Engineering community and establish yourself as a reliable resource on specific technologies and technology-enabled capabilities.

• Take the initiative in developing the next generation of talent by mentoring internal staff and actively recruiting external candidates to strengthen the Capital One tech talent pool.


⛳️ Requirements

• Bachelor's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields with a minimum of 10 years of experience in developing AI and ML algorithms or technologies, or a Master's degree in the same fields with at least 8 years of relevant experience.

• A minimum of 10 years of programming experience in Python, Go, Scala, or Java.

• A Master’s Degree in Computer Science or a Master’s Degree in Software Engineering is preferred.

• Practical experience with the internals of Ray (Actors/GCS/Scheduling) or Spark (Query Optimizer/Memory Management) is preferred.

• Experience in building platforms that facilitate LLM training, fine-tuning, or high-throughput inference is preferred.

• Hands-on experience with AWS-specific compute primitives (EKS, EC2 UltraClusters, Graviton) and strategies for cost optimization is preferred.

• A proven track record of upstream contributions to significant distributed systems projects is preferred.


🏝️ Benefits

• A comprehensive, competitive, and inclusive array of health, financial, and other benefits that support your overall well-being.

People also viewed

FutureSight10 hours ago

Co-Founder, CEO – AI Retail Planning Autopilot

US flagTexas OnlyFull-timeArtificial Intelligence
ApplyView job
Tribe AI10 hours ago

AI Delivery Lead

US flagUnited States OnlyFull-timeArtificial Intelligence
ApplyView job
AAPC10 hours ago

Director, AI Accreditation – Assurance Programs

US flagUnited States OnlyFull-timeArtificial Intelligence
ApplyView job
Gartner10 hours ago

Senior Director Analyst – AI in HR Strategy and Transformation

US flagUnited States OnlyFull-timeArtificial Intelligence$172k – $202.5k/year
ApplyView job
Circana10 hours ago

AI Engagement Lead, CPG/FMCG

GB flagUnited Kingdom OnlyFull-timeArtificial Intelligence
ApplyView job
RSI Security10 hours ago

AI GRC Platform Engineer

US flagUnited States OnlyFreelanceArtificial Intelligence$75/hour
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers