
Data & AI Operations Specialist
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in India.
• Design & Architecture: Oversee the monitoring framework for AI/ML platforms and set up sophisticated dashboards using Grafana and Azure Monitor.
• Environment Governance: Administer Azure Machine Learning (AML) workspace setups, compute targets, and manage the lifecycle of Databricks clusters (including runtime versioning and platform updates).
• Resource Optimization: Manage GPU resource distribution, reserved capacity, and cost-performance enhancements to comply with FinOps objectives.
• Security Integration: Guarantee that all AI services leverage private endpoints, VNET integration, and RBAC controls to safeguard sensitive citizen information.
• Pipeline Engineering: Take charge of the design, enhancement, and correction of Azure Data Factory (ADF) and Synapse pipelines.
• Advanced Troubleshooting: Address intricate bottlenecks related to authentication issues, data format modifications, and ETL performance.
• SOP Leadership: Create detailed Standard Operating Procedures (SOPs) for the L1 NOC team to manage routine monitoring and initial triage effectively.
• Automation: Establish CI/CD pipelines for model training, testing, and deployment to AML endpoints.
• Model Reliability: Set up data drift detection thresholds and automated retraining triggers.
• Recovery Operations: Design self-healing scripts and automated recovery runbooks for essential AI workflows.
• Audit Management: Implement and sustain audit logging for all AI decisions and model outputs, ensuring that logs are directed to the SIEM/vSOC.
• Regulatory Alignment: Perform quarterly AI governance assessments to guarantee adherence to NESA standards and data privacy regulations.
• AI/ML Platforms: Extensive knowledge of Azure Machine Learning and Databricks.
• Data Integration: Skilled in Azure Data Factory and Synapse.
• Infrastructure-as-Code (IaC): Proficient with Terraform or ARM Templates for consistent deployments.
• Observability: Capability to utilize Dynatrace, Grafana, and Azure Monitor for in-depth diagnostics.
• Containerization: Understanding of AKS, Istio Service Mesh, and KEDA.
• ITIL Mastery: Profound comprehension of ITIL-aligned Incident, Change, and Problem management.
• Security Mindset: Acquainted with NESA standards and UAE data residency obligations.
• Technical Writing: Competence in creating complex SOPs and Root Cause Analysis (RCA) documents within 48 hours of an incident.
• Certifications: Microsoft Azure Data Scientist Associate or Azure AI Engineer Associate is highly valued.
• Competitive salary and compensation package.
• Opportunities for professional development and certification.
• Collaborative and supportive work environment.
• Flexible working hours and remote work options.
10x.Team
10x.Team
Anyone AI
Anyone AI
Get handpicked remote jobs straight to your inbox weekly.