
Senior Machine Learning Operations Engineer
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in California, +2 more states.
β’ Develop and manage the real-time inference service that evaluates models for the risk decision engine, prioritizing low latency and high availability.
β’ Take ownership of the model deployment infrastructure, including registry and versioning, CI/CD processes with performance, bias, and consistency evaluations, shadow mode, and staged rollouts.
β’ Establish model observability, ensuring monitoring of availability, latency, and errors, along with drift detection to trigger retraining.
β’ Collaborate with Risk Data Science to transition models from a seamless development-to-production handoff to operational production under MLP oversight.
β’ Implement experimentation features such as champion/challenger and canary routing, as well as explainability outputs like SHAP attributions.
β’ Exhibit a strong sense of product ownership and proactively pursue responsibilities β our team self-organizes on small to medium projects, and we seek someone eager to contribute to the creation of a new platform team.
β’ A minimum of 5 years of experience in machine learning engineering, backend software engineering, MLOps, or a related discipline.
β’ Proven experience with production ML services β deploying, serving, and managing models in environments requiring low latency and high availability.
β’ Solid backend engineering skills in Python, including familiarity with API frameworks such as FastAPI or Flask.
β’ Experience with model deployment and lifecycle management tools: model registries, CI/CD for models, version control, and staged rollout strategies (shadow, canary, champion/challenger).
β’ Background in creating observability and alerting systems for production services β focusing on latency, errors, and ideally model-specific metrics like drift.
β’ Proficiency with the data infrastructure essential to ML, including SQL, key-value/low-latency stores (Redis, DynamoDB, or similar), and streaming pipelines (Kafka, Kinesis, Redpanda, or equivalent).
β’ Competitive salary
β’ Equity
β’ Health insurance plans
β’ Paid time off
β’ Remote work options
Flock Safety
Inspiren
OneStudyTeam
CDW
Get handpicked remote jobs straight to your inbox weekly.