
Senior ML Ops Engineer
Posted May 7

Posted May 7
This is a fully remote position, open to applicants in Connecticut, +3 more states.
• Automate and orchestrate machine learning workflows across leading cloud and AI platforms, including AWS, Azure, Databricks, and foundational model APIs such as OpenAI.
• Maintain and version model registries and artifact stores to guarantee reproducibility and compliance.
• Develop and oversee CI/CD for machine learning, incorporating automated data validation, model testing, and deployment.
• Implement ML Engineering solutions utilizing popular MLOps platforms like AWS SageMaker, MLflow, and Azure ML.
• Scale end-to-end custom SageMaker pipelines.
• Design and implement the engineering components of GAR+RAG systems (e.g., query interpretation and reflection, chunking, embeddings, hybrid retrieval, semantic search), manage prompt libraries, guardrails, and structured outputs for LLMs hosted on Bedrock/SageMaker or self-hosted.
• Create and design ML pipelines that leverage Elasticsearch/OpenSearch/Solr, vector databases, and graph databases.
• Build evaluation pipelines that include offline IR metrics (NDCG, MAP, MRR), LLM quality metrics (faithfulness, grounding), and A/B testing.
• Optimize infrastructure costs through monitoring, scaling strategies, and effective resource utilization.
• Stay updated with the latest GAI research, NLP, and RAG, applying state-of-the-art techniques in our experiments and systems.
• Collaborate with Subject-Matter Experts, Product Managers, Data Scientists, and Responsible AI experts to convert business challenges into cutting-edge data science solutions.
• Work closely with Operations Engineers who deploy and manage production infrastructure.
• Current experience in ML Engineering and MLOps platforms, with a proven track record of deploying ML or search/GenAI systems into production.
• Strong proficiency in Python, Java, and/or Scala is a significant advantage.
• Hands-on experience with major cloud vendor solutions (AWS, Azure, and/or Google).
• Familiarity with search/vector/graph technologies (e.g., Elasticsearch, OpenSearch, Solr, Neo4j).
• Experience in evaluating LLM models.
• A solid understanding of the Data Science Life Cycle, including feature engineering, model training, and evaluation metrics.
• A background in health technology and/or medical content workflows is preferred.
• Familiarity with ML frameworks such as PyTorch, TensorFlow, and PySpark.
• Experience with large-scale data processing systems like Spark.
• Knowledge of statistical analysis, machine learning theory, and natural language processing.
• This position is eligible for an annual incentive bonus.
• We are pleased to provide country-specific benefits.
Flock Safety
Inspiren
OneStudyTeam
CDW
Get handpicked remote jobs straight to your inbox weekly.