Remotery

Senior Cloud - Kubernetes SRE

Posted 9 hours ago

This is a fully remote position, open to applicants in United Kingdom.

📋 Description

• Manage, enhance, and scale production OKD / Kubernetes clusters in both on-premises and hybrid settings.

• Assist in the transition from VMware to KVM, contributing to the modernization of the underlying compute and storage infrastructure.

• Take ownership of and refine CI/CD processes throughout the entire lifecycle of platform and application components.

• Collaborate with platform and application engineers to facilitate cloud-native delivery using tools like Helm and Kustomize.

• Develop and advance GitOps deployment practices utilizing tools such as Argo CD or Flux.

• Oversee and enhance core platform services, including identity management, ingress, observability, certificate management, service mesh, and container registry capabilities.

• Construct and maintain observability frameworks encompassing logs, metrics, traces, alerting, SLOs, and error budgets.

• Strengthen platform security in accordance with secure and regulated environment standards, addressing network policy, SELinux, image provenance, secret management, and auditing.

• Automate repetitive operational tasks through tools such as Ansible, Terraform, Helm, Kustomize, Go, Python, or similar technologies.

• Spearhead incident response efforts, support blameless post-mortems, and advocate for systemic solutions.

• Collaborate with networking and security teams on platform integration, segmentation, load balancing, and accreditation documentation.

• Produce and maintain comprehensive technical documentation, runbooks, design notes, and operational guidance.

• Mentor fellow engineers and serve as a senior technical authority in cloud and Kubernetes operations.

• Participate in an on-call rotation, with appropriate compensation.


⛳️ Requirements

• Extensive experience managing production Kubernetes environments, not merely deploying into them.

• Strong foundational knowledge of Linux, including systemd, networking, storage, and performance troubleshooting.

• Familiarity with at least one Kubernetes distribution such as OKD, OpenShift, vanilla Kubernetes, Rancher, EKS, AKS, or GKE.

• Experience with infrastructure as code and automation tools, such as Ansible, Terraform, Helm, or Kustomize.

• Proficient in using GitOps tools like Argo CD or Flux in production settings.

• Proven experience in building or managing CI/CD pipelines for platform or application components.

• Strong expertise in observability, covering logs, metrics, and traces, with tools like Prometheus, Grafana, Elastic Stack, OpenTelemetry, or similar.

• Experience with identity and access management technologies, including OIDC, SAML, SCIM, or Keycloak.

• Background in virtualization or infrastructure platforms such as KVM, libvirt, or VMware.

• Proficiency in scripting or tooling with Go, Python, shell scripting, or similar languages.

• Excellent troubleshooting, problem-solving, and analytical abilities.

• Experience within secure, regulated, or enterprise-scale environments.

• Strong communication skills, capable of producing clear documentation, runbooks, post-mortems, and technical guidance.

• Eligibility to obtain UK SC clearance.

• Specific experience with OpenShift or OKD, including operators, MachineConfig, or SCCs. (Desirable)

• Familiarity with service mesh technologies such as Istio or Linkerd. (Desirable)

• Knowledge of policy engines like OPA, Gatekeeper, or Kyverno. (Desirable)

• Experience in cloud-native application deployment using Helm, Terraform, Kustomize, or similar tools. (Desirable)

• GitOps and CI/CD experience managing complete application and component lifecycles. (Desirable)

• Background in storage solutions like Ceph, Longhorn, OpenShift Data Foundation, or equivalent. (Desirable)

• Networking experience with technologies such as BGP, VXLAN, Palo Alto, or Juniper. (Desirable)

• Familiarity with software supply chain security, including SBOMs, image signing, admission control, or tools like Sigstore. (Desirable)

• Experience in operating AI, ML, or GPU-enabled platforms. (Desirable)

• CKA, CKAD, CKS, Red Hat certifications, or equivalent qualifications. (Desirable)

• Active or recent UK SC clearance. (Desirable)

• Recognized contributions to the Kubernetes open-source community. (Desirable)


🏝️ Benefits

• Private Medical

• Health Cash Plan

• 4x Life Assurance

• Inclusive Culture: Enjoy an inclusive culture and environment.

• Holiday: Generous holiday allowance.

• Learning: Access to continuous learning and development opportunities.

• Bonus Potential: Bonus potential based on performance and business-related factors.

• Discounts: Discounts on a wide range of products and services.

• Pension: Pension scheme contributions.

• EV Car Scheme

• Regular Pay Reviews

• More Benefits: Explore additional benefits on our career site.

People also viewed

Software Mind9 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal9 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job
Avaya9 hours ago

Site Reliability Engineer – Azure, DevSecOps, IaC, Governance, Observability

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$129k – $143k/year
ApplyView job
Agilent Technologies9 hours ago

DevOps Engineer – Platform, AWS, CI/CD

US flagColorado OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$143.8k – $224.6k/year
ApplyView job
Dropbox9 hours ago

Site Reliability Engineer

PL flagPoland OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
TechInsights9 hours ago

Senior Site Reliability Engineer – Remote UK

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)£77.6k – £82.2k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers