
DevOps Engineer – ML & Data Infrastructure
Posted 7 hours ago

Posted 7 hours ago
This is a fully remote position, open to applicants in United States.
• Oversee, configure, and automate cloud infrastructure utilizing tools such as Terraform and Ansible.
• Develop CI/CD pipelines for machine learning models and data workflows, emphasizing automation, version control, rollback, and monitoring with tools like Vertex AI, Jenkins, and DataDog.
• Create and sustain scalable data and feature pipelines for both real-time and batch processing, employing BigQuery, BigTable, Dataflow, Composer, Pub/Sub, and Cloud Run.
• Establish infrastructure for model monitoring and observability, identifying drift, bias, and performance issues through Vertex AI Model Monitoring and custom dashboards.
• Enhance inference performance, focusing on reducing latency and improving cost-efficiency of AI workloads.
• Maintain overall system reliability, scalability, and performance across the ML/Data platform.
• Define and enforce infrastructure best practices for deployment, monitoring, logging, and security.
• Diagnose and resolve complex issues impacting ML/Data pipelines and production systems.
• Ensure adherence to data governance, security, and regulatory standards, particularly in real-money gaming environments.
• Mentor and lead DevOps engineers, assisting in technical decision-making and operational processes.
• Aid in sprint planning, task prioritization, and cross-functional collaboration across infrastructure and platform efforts.
• Perform code reviews, disseminate best practices, and contribute to fostering a high-performing engineering culture.
• Work closely with ML, Data, Product, and Security teams to ensure infrastructure strategy aligns with business goals.
• A minimum of 5 years of experience as a DevOps Engineer, preferably with a focus on ML and Data infrastructure.
• Proven experience in leading projects, mentoring engineers, or managing technical teams.
• Extensive hands-on experience with Google Cloud Platform (GCP), particularly with BigQuery, Dataflow, Vertex AI, Cloud Run, and Pub/Sub.
• Proficient in Terraform, with additional skills in Ansible considered a bonus.
• Strong understanding of containerization (Docker, Kubernetes) and orchestration platforms such as GKE.
• Experience in building and maintaining CI/CD pipelines, ideally with Jenkins.
• Comprehensive understanding of monitoring and logging best practices for cloud and data systems.
• Proficient in scripting languages such as Python, Groovy, or Shell.
• Familiarity with AI orchestration frameworks like LangGraph or LangChain is advantageous.
• Excellent communication, collaboration, and stakeholder management abilities.
• Bonus points if you have experience in gaming, real-time fraud detection, or AI-driven personalization systems.
• Competitive salary and comprehensive benefits package.
• Opportunities for professional growth and development.
• Collaborative and inclusive work environment.
• Flexible work arrangements to support work-life balance.
Ping Identity
Stack AV
May Mobility
Practical DevSecOps
Get handpicked remote jobs straight to your inbox weekly.