Remotery

Senior AI Platform Engineer

atExpelUS flagUnited StatesFull-timePlatform EngineerSenior$142.9k – $207.2k/year

Posted 10 hours ago

This is a fully remote position, open to applicants in United States.

📋 Description

• Design and oversee comprehensive machine learning training pipelines on AWS (SageMaker, EKS, Step Functions) to guarantee consistent and reproducible model development and deployment.

• Develop and sustain infrastructure for production agentic applications utilizing Amazon Bedrock and Bedrock AgentCore — encompassing agent runtimes, memory, secure gateways, and large-scale observability.

• Participate in the architectural advancement of our ML platform, including assessing MLOps tools and engaging in buy vs. build evaluations.

• Apply AI/ML governance best practices for model versioning, testing, validation, maintenance, and security.

• Align MLOps best practices with Expel's SDLC, security, and infrastructure benchmarks, collaborating with SRE, Platform Engineering, and Security teams.

• Enhance quality, reliability, and scalability through strategic engineering and monitoring.

• Collaborate with data scientists, software engineers, and stakeholders to ensure the reliable and scalable operationalization of ML models.

• Guide and assist junior engineers; promote a culture of engineering excellence.

• Develop and maintain documentation, internal tools, and enablement resources to empower practitioners across Expel in working effectively with ML systems.

• Keep abreast of the MLOps landscape and reintroduce relevant innovations to the team.


⛳️ Requirements

• A minimum of 5 years of relevant software engineering experience with a significant emphasis on ML operations and infrastructure.

• A degree in Computer Science, Mathematics, Statistics, Engineering, or a related technical field is preferred (or a compelling narrative).

• Proficient in Python; familiarity with additional languages (Go, JS) is advantageous.

• Extensive experience with CI/CD pipelines, infrastructure-as-code, and containerization tailored for ML workloads.

• Practical experience with cloud-based ML platforms — AWS (SageMaker, Bedrock, Bedrock AgentCore) is strongly preferred; experience with GCP (Vertex AI) is also appreciated.

• Demonstrated experience in operationalizing LLMs and constructing infrastructure for intricate agentic applications — including agent orchestration, memory, tool calling, and RAG architectures.

• Familiarity with ML frameworks such as Scikit-Learn, PyTorch, Spark, and TensorFlow.

• Knowledge of continuous retraining, concept drift monitoring, and data drift detection in production environments.


🏝️ Benefits

• Provide unlimited PTO (which leadership actively models and encourages).

• Offer up to 24 weeks of parental leave.

• Excellent health benefits.

• Monthly stipends for fitness and cell phone expenses — no receipts needed.

• Support professional development with conference benefits and ongoing learning opportunities.

• Full remote flexibility — work from wherever you perform best.

People also viewed

futureproof consulting9 hours ago

Cloud-Native Scientific Platform Engineer

GB flagUnited Kingdom OnlyFreelancePlatform Engineer
ApplyView job
LTS9 hours ago

Principal Platform Engineer

US flagUnited States OnlyFull-timePlatform Engineer
ApplyView job
Ad Hoc LLC9 hours ago

Platform Engineer

US flagUnited States OnlyFull-timePlatform Engineer$125k – $135k/year
ApplyView job
Glydways9 hours ago

Data Platform Engineering Lead

US flagUnited States OnlyFull-timePlatform Engineer
ApplyView job
Alternative Payments10 hours ago

Senior Platform Engineer

BR flagBrazil OnlyFull-timePlatform Engineer$72k – $90k/year
ApplyView job
Persona10 hours ago

Software Engineer, Data Platform

US flagCalifornia OnlyFull-timePlatform Engineer$130k – $220k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers