Senior Cloud Engineer – Azure/OpenShift
Posted 23 hours ago
Posted 23 hours ago
• Oversee the support and operation of cloud infrastructure, platform services, identity management, networking, security measures, and operational tools across customer environments.
• Capable of designing and leading the deployment of moderately complex cloud solutions.
• Possesses an understanding of software technologies' performance, scalability, and functional characteristics.
• Able to comprehend open-source and cloud use cases, and suggest standard design patterns commonly utilized in these solutions (best practices).
• Take ownership of complex incidents, escalations, and problem investigations; execute advanced troubleshooting, coordination, service restoration, and ensure durable resolution.
• Plan and implement complex changes and routine operational tasks including provisioning, access alterations, maintenance events, backup and recovery validation, patching coordination, and platform hygiene.
• Act as a senior escalation point within the on-call rotation for major incidents, high-impact issues, and customer-approved after-hours change activities.
• Adhere to and promote established ITSM processes for incident, request, change, problem, escalation, documentation, and customer-facing status communication.
• Create and maintain runbooks, standard operating procedures (SOPs), standards, knowledge articles, and technical documentation that enhance consistency and service quality.
• Mentor other Cloud Engineers, review their work for quality and completeness, and provide technical guidance on operational best practices.
• Advocate for improvements in monitoring, alerting, logging, tagging, policy compliance, and cost visibility that enhance managed cloud operations.
• Utilize scripting, automation, and AI to minimize repetitive tasks, enhance consistency, and scale service delivery.
• General familiarity with DevOps/SRE tooling is required, though it is not the primary focus of the role.
• Engage in customer meetings, service reviews, and advisory discussions; articulate technical issues, risks, and improvement opportunities in a clear manner suitable for business communication.
• Operate and maintain Red Hat OpenShift (Kubernetes) clusters in production, including managing cluster health, upgrades, scaling, and lifecycle management.
• Oversee OpenShift access and security controls, including role-based access control (RBAC), security context constraints (SCCs), network policies, secrets management, and considerations for certificates and ingress.
• Diagnose platform and workload issues across Kubernetes/OpenShift constructs (nodes, operators, routes/ingress, services, deployments, pods, persistent volumes) and coordinate remediation with application, network, and security teams.
• Implement and validate platform backup, restore, and disaster recovery processes (e.g., etcd, cluster resources, and persistent data) in line with customer requirements.
• Support platform automation and standardization initiatives using infrastructure as code and GitOps practices (e.g., Terraform, Ansible, Helm, Argo CD) to enhance repeatability and mitigate operational risk.
• Define and enhance observability for cloud and OpenShift platforms (metrics, logs, traces), optimize alerting to reduce noise, and contribute to availability, performance, and capacity planning.
• Other job responsibilities as assigned.
• Minimum of 5 years in customer-facing IT infrastructure, cloud operations, systems administration, or managed services support within production environments.
• Strong operational proficiency in at least one major cloud platform, with the capacity to lead complex support and administrative tasks in Azure.
• Experience with other cloud platforms such as GCP, AWS, and OCI is highly preferred.
• At least 3 years of experience in supporting a production OpenShift environment (on-premises, ROSA, ARO, etc.).
• Proven experience in managing complex incidents, escalations, change execution, and problem investigations in production settings.
• Familiarity with Windows and/or Linux server operations, networking fundamentals, identity and access management, monitoring, governance, and operational documentation.
• Experience in managed services, consulting, or multi-customer support environments, preferably with complex enterprise customers (preferred).
• Strong working knowledge of PowerShell, Python, Bash, infrastructure as code, automation, CI/CD, or related tools that enhance cloud operations (preferred).
• Relevant advanced cloud, operations, or platform certifications are considered a plus (preferred).
• Medical, Dental, and Vision Insurance
• 401(k)
• Paid company holidays
• Paid time off
• Paid parental and caregiver leave
• Plus more! See benefits https://www.aheadbenefits.com/ for additional details.
BTS
DXC Technology
Tech Minds Agency
Get handpicked remote jobs straight to your inbox weekly.