This is a fully remote position, open to applicants in United Kingdom.

📋 Description

• Design, implement, and manage scalable, secure, and highly available AWS cloud infrastructure utilizing services such as EC2, EKS, ECS, RDS, S3, VPC, Lambda, and IAM.

• Enhance the reliability and performance of containerized applications by overseeing Amazon EKS and ECS environments, which includes cluster operations, networking, scaling, and troubleshooting.

• Ensure the stability, security, and efficiency of production Linux environments through system administration, performance tuning, storage management, networking, and incident resolution.

• Maintain and optimize both relational databases (PostgreSQL, MySQL, Aurora) and NoSQL platforms (DynamoDB, Redis), ensuring they are highly available, performant, and ready for disaster recovery.

• Strengthen the organization's cloud security posture by effectively managing IAM, network security controls, secrets management, and adhering to compliance best practices.

• Improve platform observability and operational excellence by implementing and enhancing monitoring, logging, alerting, and performance analytics using tools like CloudWatch, Prometheus, and Grafana.

• Take charge of production incidents by engaging in on-call rotations, leading troubleshooting efforts, conducting root cause analysis, and fostering continuous improvement initiatives.

• Collaborate closely with software engineering, DevOps, and platform teams to enhance deployment processes, application reliability, and operational efficiency.

• Identify and execute cloud cost optimization opportunities through resource right-sizing, capacity planning, automation, and governance best practices.

⛳️ Requirements

• 4–5 years of experience in a cloud operation, infrastructure engineering, or SRE role with a strong hands-on technical emphasis.

• Extensive hands-on experience with core AWS services: EC2, EKS, ECS, RDS/Aurora, S3, VPC, IAM, Lambda, CloudWatch, Route 53, and ALB/NLB.

• Demonstrated ability to design and troubleshoot complex AWS networking architectures (VPCs, subnets, transit gateways, security groups).

• Strong understanding of AWS IAM, including roles, policies, permission boundaries, and cross-account access.

• Hands-on production experience managing workloads on Amazon EKS and ECS, including cluster lifecycle, node group management, networking (CNI, service mesh basics), and autoscaling.

• Fundamental knowledge of Docker: image builds, registries (ECR), multi-stage builds, and container security.

• Strong Linux administration skills, including Bash/Python scripting, process and memory management, filesystem and storage operations, kernel parameters, and network diagnostics.

• Experience in managing and hardening Linux servers in production environments (RHEL, Ubuntu, or Amazon Linux).

• Proficient in Terraform, including module design, state management, remote backends, and workspace strategies.

• Practical experience with Puppet for configuration management, node classification, and enforcing system state at scale.

• Hands-on experience with relational databases such as PostgreSQL, MySQL, or AWS RDS/Aurora, including schema management, query optimization, replication, backups, and failover.

• Familiarity with NoSQL databases like DynamoDB, Redis, or MongoDB, including data modeling, performance tuning, and operational monitoring.

• Understanding of CI/CD pipelines using tools such as GitHub Actions, Jenkins, or AWS CodePipeline.

• Experience with observability tools, including CloudWatch, Datadog, Prometheus, or Grafana.

🏝️ Benefits

• Flexible working arrangements.

• Professional development opportunities.

Senior Cloud Operations Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Senior Site Reliability Engineer

Senior DevOps Engineer

Staff Software Engineer – Databases SRE

DevOps Engineer

Senior DevOps Engineer

Senior DevOps/DevSecOps

Never miss a great job!