Remotery

Senior MLOps Engineer

Posted 20 hours ago

This is a fully remote position, open to applicants in Brazil.

📋 Description

• Assume responsibility for the ML serving API that delivers NBA recommendations, collaborating with the data engineer who has developed it, and enhance it for low-latency production traffic.

• Establish the initial repeatable deployment pipeline: model artifact → versioned, deployable, rollback-able production service, with infrastructure defined as code.

• Implement the monitoring foundation: latency/error/drift dashboards, alerting, and audit/trace visibility across models and agents.

• Develop a professional relationship with HAL and become the data team's primary resource for ML serving and reliability decisions.

• Act as the main owner (with support from the data engineer) of the ML serving platform and deployment pipelines for NBA and our ML models.

• Ensure at least one production model and one production agent are fully instrumented — including versioning, monitoring, alerting, and multi-tenant gating in place.

• Outline the data team's playbook for deploying a new ML model to production, covering the entire process.

• Influence architectural decisions across APIs, processing pipelines, distributed computing, storage, search, observability, cloud infrastructure, and model-serving workflows.

• Guide data engineers on MLOps patterns so they can confidently support and enhance the systems you manage.

• Operate as the technical lead within the data team for NBA production ML operations — the go-to person for other teams when they seek to understand how Clutch ships and operates ML reliably.

• Demonstrate measurable improvements in cost and latency.

• Shape the data team's roadmap for the next generation of ML infrastructure, in collaboration with the PM and data scientist.

• Assist in determining future hiring needs as the team expands.


⛳️ Requirements

• Over 8 years of experience in software, data, or ML engineering, with 4–5+ years managing ML systems in production — you've transitioned models from prototype to production and take ownership of post-deployment outcomes.

• Proficient in Python — the majority of the work (serving API, pipelines, tooling, data pipelines) is conducted in Python, and you are comfortable with production codebases, not just notebooks. Some TypeScript is involved for integration with our agent runtime — you don't need to be an expert, but familiarity with a second language is necessary.

• Experience with CI/CD & deployment practices. You construct training and deployment pipelines that transition a model artifact to a versioned, deployable, rollback-able production service, complete with automated testing and reproducible builds. You have implemented CI/CD for ML and built and maintained CI/CD pipelines (GitHub Actions, Bamboo, GitLab CI, or similar).

• Familiarity with infrastructure as code. You manage cloud infrastructure (AWS Lambda, ECS) using Terraform or a similar tool — avoiding click-ops, everything is reviewable and reproducible.

• Adept in monitoring & observability practices. You instrument serving systems for latency, error rates, drift, and cost; you analyze audit logs and distributed traces; you configure alerting to catch regressions before they impact users. You treat monitoring as a first-class deliverable, not an afterthought.

• Strong reliability practices. You design for potential failures: structured error handling, graceful degradation, rollback paths, and runbooks. You have a narrative about a production incident you managed and how you reinforced the system afterward.

• Experience in building and operating low-latency production APIs (FastAPI, BentoML, or equivalent), with well-informed opinions on serving, batching, and caching.

• Comfortable working in AWS (especially Lambda), containers (Docker), and GitHub-based workflows.

• Knowledge of security & governance. You ensure security and governance across systems: IAM, KMS, access policies, and Secrets Manager/SSM.

• Proficient in DevOps / infrastructure, along with data manipulation and feature engineering.

• Strong understanding of ML concepts: models, pipelines, metrics, and supervised/unsupervised learning.

• Capable of integrating and optimizing AI/ML services with the company's other systems.

• Actively use AI tools in your engineering workflow — not as a novelty, but as a standard practice. You will be expected to demonstrate this during the technical evaluation.

• Experience with Databricks and PySpark.


🏝️ Benefits

• Remote Flexibility: Enjoy the freedom of remote work from anywhere, balancing life and career seamlessly.

• Unforgettable Off-Sites: Twice a year, bond with colleagues in exciting destinations, fostering teamwork and fresh ideas.

• Paid Time Off and National Holidays: Enjoy 20 PTO days yearly and the National Holidays for relaxation and rejuvenation.

• Stock Options: Joining us means having a stake in our success, so you'll receive stock options as part of your compensation package.

• Home Office Setup: Create your ideal workspace with a dedicated budget for home office essentials.

• Work Trip Budget: Grow personally and professionally with a budget for work-related trips and co-working.

People also viewed

Flock Safety8 hours ago

Full Stack Engineer, Machine Learning Tooling

US flagNew York OnlyFull-timeMachine Learning Engineer$145k – $165k/year
ApplyView job
Inspiren8 hours ago

Senior Machine Learning Engineer

US flagNew York OnlyFull-timeMachine Learning Engineer$200k – $230k/year
ApplyView job
OneStudyTeam8 hours ago

Senior Machine Learning Engineer

US flagUnited States OnlyFull-timeMachine Learning Engineer$140k – $190k/year
ApplyView job
CDW8 hours ago

Senior ML, MLOps Engineer

US flagUnited States OnlyFull-timeMachine Learning Engineer
ApplyView job
Extend10 hours ago

Manager, Machine Learning

US flagUnited States OnlyFull-timeMachine Learning Engineer$180k – $210k/year
ApplyView job
CD PROJEKT SA10 hours ago

Machine Learning, Game Tech Architect

CA flagCanada OnlyFull-timeMachine Learning Engineer$180.1k – $247.6k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers