
Senior Cloud Engineer – Azure/OpenShift
Posted 1 hour ago

Posted 1 hour ago
This is a fully remote position, open to applicants in United States.
• Oversee the support and operation of cloud infrastructure, platform services, identity, networking, security measures, and operational tools across customer environments.
• Capable of designing and leading the deployment of moderately complex cloud solutions.
• Possesses an understanding of the performance, scaling, and functional traits of software technologies.
• Able to comprehend open-source and cloud use-cases, recommending standard design patterns typically employed in such solutions (best practices).
• Manage complex incidents, escalations, and problem investigations; conduct advanced troubleshooting, coordination, service restoration, and ensure durable resolution.
• Plan and implement complex changes and recurring operational tasks, including provisioning, access modifications, maintenance events, backup and recovery validation, patching coordination, and platform hygiene.
• Act as a senior escalation point within the on-call rotation for major incidents, high-impact issues, and customer-approved after-hours change activities.
• Adhere to and reinforce established ITSM processes for incident, request, change, problem, escalation, documentation, and customer-facing status communication.
• Create and maintain runbooks, SOPs, standards, knowledge articles, and technical documentation that enhance consistency and service quality.
• Mentor fellow Cloud Engineers, review their work for quality and completeness, and provide technical guidance on operational best practices.
• Drive improvements in monitoring, alerting, logging, tagging, policy, compliance, and cost visibility that enhance managed cloud operations.
• Utilize scripting, automation, and AI to decrease repetitive tasks, enhance consistency, and scale service delivery.
• General familiarity with DevOps/SRE tooling is necessary but is not the main focus of the role.
• Participate in customer meetings, service reviews, and advisory discussions; translate technical issues, risks, and opportunities for improvement into clear business communication.
• Operate and support Red Hat OpenShift (Kubernetes) clusters in production, ensuring cluster health, conducting upgrades, scaling, and managing the lifecycle.
• Manage OpenShift access and security controls, including RBAC, SCCs, NetworkPolicies, secrets management, and considerations for certificates/ingress.
• Troubleshoot platform and workload issues across Kubernetes/OpenShift constructs (nodes, operators, routes/ingress, services, deployments, pods, persistent volumes) and coordinate remediation with application, network, and security teams.
• Implement and validate platform backup, restore, and disaster recovery procedures (e.g., etcd, cluster resources, and persistent data) in line with customer requirements.
• Support platform automation and standardization efforts utilizing infrastructure as code and GitOps practices (e.g., Terraform, Ansible, Helm, Argo CD) to enhance repeatability and mitigate operational risk.
• Define and enhance observability for cloud and OpenShift platforms (metrics, logs, traces), adjust alerting to minimize noise, and contribute to availability, performance, and capacity planning.
• Other job responsibilities as assigned.
• 5+ years of experience in customer-focused IT infrastructure, cloud operations, systems administration, or managed services support, including work in production settings.
• Strong operational proficiency in at least one major cloud platform, with the capability to lead complex support and administration activities in Azure.
• Experience with other cloud platforms such as GCP, AWS, and OCI is highly preferred.
• At least 3+ years of experience supporting a production OpenShift environment (on-premises, ROSA, ARO, etc.).
• Proven experience leading complex incidents, escalations, change execution, and problem investigations within production environments.
• Familiarity with Windows and/or Linux server operations, networking fundamentals, identity and access management, monitoring, governance, and operational documentation.
• Experience in a managed services, consulting, or multi-customer support environment, ideally with complex enterprise customers (preferred).
• Strong working knowledge of PowerShell, Python, Bash, infrastructure as code, automation, CI/CD, or related platform tools used to enhance cloud operations (preferred).
• Relevant advanced cloud, operations, or platform certifications are an advantage (preferred).
• Medical, Dental, and Vision Insurance
• 401(k)
• Paid company holidays
• Paid time off
• Paid parental and caregiver leave
• Plus more! See benefits https://www.aheadbenefits.com/ for additional details.
Presidio
Duck Creek Technologies
Bamboo Health
Volantsoft Inc
Get handpicked remote jobs straight to your inbox weekly.