Remotery

Senior Distinguished Engineer, AI Compute

Posted May 10

This is a fully remote position, open to applicants in California, +3 more states.

📋 Description

• Design and implement control and data plane solutions necessary for establishing a highly available, multi-tenant, large-scale, and secure machine learning platform.

• Create distributed compute engine solutions using Ray and Spark to enhance various workloads, including LLM pre-training, reinforcement learning, and extensive data processing, while optimizing compute unit economics.

• Drive systemic enhancements for operational excellence by automating KTLO (Keep The Lights On) processes.

• Oversee the technical execution of a varied project portfolio, working alongside developers who specialize in various areas, from distributed microservices to managing large foundation models.

• Collaborate across product and program management teams, as well as with stakeholders and partners throughout Capital One, to enhance business outcomes while advancing robust technology solutions.

• Share your enthusiasm for staying informed about technological advancements, experimenting with new technologies, engaging in internal and external tech communities, and leading system design and code review sessions.

• Contribute to the growth of the Capital One Distinguished Engineering community and position yourself as a key resource in specific technologies and technology-enabled capabilities.

• Take the lead in nurturing the next generation of talent by mentoring internal staff and actively recruiting external candidates to enhance the Capital One tech talent pool.


⛳️ Requirements

• A Bachelor's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields with a minimum of 10 years of experience in developing AI and ML algorithms or technologies; alternatively, a Master's degree in the same fields with at least 8 years of relevant experience.

• A minimum of 10 years of programming experience with Python, Go, Scala, or Java.

• A Master’s Degree in Computer Science or Software Engineering is preferred.

• Practical experience with the internals of Ray (Actors/GCS/Scheduling) or Spark (Query Optimizer/Memory Management) is preferred.

• Experience in building platforms that facilitate LLM training, fine-tuning, or high-throughput inference is preferred.

• Hands-on experience with AWS-specific compute services (EKS, EC2 UltraClusters, Graviton) and cost-optimization approaches is preferred.

• A track record of upstream contributions to significant distributed systems projects is preferred.


🏝️ Benefits

• A comprehensive, competitive, and inclusive array of health, financial, and additional benefits that support your overall well-being.

People also viewed

FutureSight9 hours ago

Co-Founder, CEO – AI Retail Planning Autopilot

US flagTexas OnlyFull-timeArtificial Intelligence
ApplyView job
Tribe AI9 hours ago

AI Delivery Lead

US flagUnited States OnlyFull-timeArtificial Intelligence
ApplyView job
AAPC9 hours ago

Director, AI Accreditation – Assurance Programs

US flagUnited States OnlyFull-timeArtificial Intelligence
ApplyView job
Gartner9 hours ago

Senior Director Analyst – AI in HR Strategy and Transformation

US flagUnited States OnlyFull-timeArtificial Intelligence$172k – $202.5k/year
ApplyView job
Circana9 hours ago

AI Engagement Lead, CPG/FMCG

GB flagUnited Kingdom OnlyFull-timeArtificial Intelligence
ApplyView job
RSI Security9 hours ago

AI GRC Platform Engineer

US flagUnited States OnlyFreelanceArtificial Intelligence$75/hour
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers