Remotery

Azure CloudOps Engineer

Posted Jun 19

This is a fully remote position, open to applicants in United States.

📋 Description

• Oversee and provide support for Azure infrastructure across development, QA, staging, and production environments.

• Ensure the operational health of Static Web Apps, Container Apps, PostgreSQL, Storage Accounts, SignalR, Service Bus, Azure AI Foundry, Azure Arc, and associated services.

• Guarantee that resources are provisioned, configured, monitored, maintained, and decommissioned according to company standards.

• Assist in setting up environments for new products, clients, and integrations.

• Identify and rectify infrastructure issues that impact performance, reliability, availability, or security.

• Develop and manage Terraform modules and environmental configurations.

• Ensure that infrastructure changes are version-controlled, peer-reviewed, tested, and authorized.

• Handle Terraform state, workspaces, variables, secrets, and deployment workflows.

• Identify and resolve discrepancies between Terraform and deployed Azure resources.

• Standardize naming conventions, tagging, resource group structures, environment isolation, and module patterns.

• Create, maintain, and troubleshoot GitHub Actions workflows for application and infrastructure deployments.

• Facilitate CI/CD pipelines across various SaaS products and environments.

• Implement promotion processes from development to QA to staging to production.

• Introduce deployment safeguards such as environment protection rules, approvals, rollback procedures, validation checks, release gates, and audit trails.

• Oversee pipeline secrets, service principals, managed identities, and deployment credentials.

• Enhance build and deployment reliability, speed, and traceability.

• Operate and monitor Azure AI services, including Azure AI Foundry and Speech-to-Text workloads.

• Support production operations for LLM integrations and AI-enabled product features.

• Monitor AI service availability, latency, quota usage, token consumption, API failures, throttling, and costs.

• Assist in defining operational standards for AI workloads: access control, logging, alerting, failover, usage governance, and provider disruption management.

• Collaborate with engineering teams to troubleshoot AI service issues, integration failures, degraded model responses, or provider-side disruptions.

• Ensure secure handling of AI secrets, endpoints, keys, managed identities, and private network access.

• Implement and maintain monitoring systems with Azure Monitor, Log Analytics, Application Insights, and related tools.

• Develop dashboards for infrastructure, application, database, messaging, storage, AI service, and deployment health.

• Configure alerts for availability, latency, errors, resource saturation, queue depth, job failures, deployment failures, database health, quota exhaustion, and cost anomalies.

• Enhance signal quality by minimizing noise and ensuring alerts are actionable.

• Participate in production incident response for infrastructure, deployments, integrations, and platform services.

• Triage and resolve issues across Azure services, CI/CD, Terraform, networking, databases, messaging, and AI integrations.

• Develop and maintain runbooks for common operational challenges.

• Assist in root cause analyses and post-incident evaluations.

• Implement preventive measures after incidents to improve reliability.

• Help define severity levels, escalation paths, response expectations, on-call processes, and production support protocols.

• Enforce cloud security best practices across Azure environments.

• Manage Azure RBAC, managed identities, service principals, Key Vault access, and least-privilege permissions.

• Secure GitHub Actions workflows, deployment credentials, environment secrets, and production access.

• Aid in secret rotation, certificate management, and secure configuration management.

• Enforce network security through private endpoints, firewalls, IP restrictions, and environment-specific access rules.

• Support audit and compliance readiness for SOC 2, ISO 27001, or similar frameworks.

• Assist with Azure PostgreSQL operations: backups, restores, performance monitoring, connection limits, high availability, and capacity planning.

• Monitor and maintain Azure Storage Accounts, lifecycle policies, access controls, backup strategies, and usage trends.

• Support Azure Service Bus operations: queue/topic monitoring, dead-letter handling, retry behavior, and throughput.

• Ensure the operational health of SignalR, including connection metrics, scaling behavior, and related production issues.

• Monitor Azure expenditures across products, environments, services, and applicable customers.

• Implement tagging standards to facilitate cost allocation by product, environment, customer, or business unit.

• Create cost dashboards, budget alerts, anomaly detection, and conduct recurring cost reviews.

• Identify underutilized resources and recommend right-sizing opportunities.

• Review AI service costs, LLM and token usage, Speech-to-Text usage, storage growth, database sizing, and environment costs.

• Suggest savings plans, reservations, scaling rules, lifecycle policies, or shutdown schedules.

• Define and maintain backup and recovery procedures for critical cloud services.

• Test database restores and validate backup reliability.

• Help define recovery time objectives (RTOs) and recovery point objectives (RPOs) for production systems.

• Support disaster recovery planning for SaaS products and customer-facing services.

• Enhance resilience through scaling rules, failover patterns, health checks, synthetic monitoring, and production readiness assessments.

• Create and maintain CloudOps documentation, runbooks, deployment guides, and environment standards.

• Establish standards for naming, tagging, logging, alerting, access control, Terraform structure, GitHub Actions patterns, and production modifications.

• Document procedures for cloud services, CI/CD workflows, AI services, and incident response.

• Empower engineering teams with reusable patterns, templates, and self-service guidance.


⛳️ Requirements

• Over 7 years of hands-on experience managing production workloads in Microsoft Azure.

• Extensive experience with Terraform and infrastructure as code.

• Proven experience in building and maintaining CI/CD pipelines using GitHub Actions.

• Familiarity with containerized workloads, preferably Azure Container Apps or similar technologies.

• Experience with Azure Monitor, Log Analytics, and Application Insights.

• Proficiency in Azure PostgreSQL or similar managed relational database services.

• Strong grasp of Azure networking, DNS, identity management, RBAC, managed identities, Key Vault, and security best practices.

• Experience in troubleshooting production incidents across infrastructure, deployments, networking, and cloud services.

• Proficient in scripting with Bash, PowerShell, Python, or similar languages.

• Excellent documentation, communication, and cross-functional collaboration skills.


🏝️ Benefits

• Competitive salary based on experience.

• Opportunities for career growth and professional development.

• Experience working with a diverse, global team in a remote work environment.

People also viewed

Presidio48 min ago

Senior Engineer, Modern Platforms, Cloud Infrastructure

US flagUnited States OnlyFull-timeCloud Engineer
ApplyView job
Thinkahead Consultant Psychologist Pty Ltd48 min ago

Senior Cloud Engineer – Azure/OpenShift

US flagUnited States OnlyFull-timeCloud Engineer$140k – $160k/year
ApplyView job
Duck Creek Technologies48 min ago

Cloud Engineer I

US flagMassachusetts, +1 more stateFull-timeCloud Engineer$95k – $135.9k/year
ApplyView job
Bamboo Health11 hours ago

Cloud Engineer

US flagUnited States OnlyFull-timeCloud Engineer
ApplyView job
Volantsoft Inc11 hours ago

Senior AWS Connect Developer

US flagAlabama, +45 more statesFull-timeCloud Engineer
ApplyView job
Ensunet Technology Group11 hours ago

Senior Oracle Utilities Cloud Architect

US flagCalifornia OnlyFull-timeCloud Engineer$78 – $81/hour
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers