Remotery

Senior Cloud Operations Engineer

Posted 3 days ago

This is a fully remote position, open to applicants in United Kingdom.

📋 Description

• Design, implement, and manage scalable, secure, and highly available AWS cloud infrastructure utilizing services such as EC2, EKS, ECS, RDS, S3, VPC, Lambda, and IAM.

• Enhance the reliability and performance of containerized applications by overseeing Amazon EKS and ECS environments, which includes cluster operations, networking, scaling, and troubleshooting.

• Ensure the stability, security, and efficiency of production Linux environments through system administration, performance tuning, storage management, networking, and incident resolution.

• Maintain and optimize both relational databases (PostgreSQL, MySQL, Aurora) and NoSQL platforms (DynamoDB, Redis), ensuring they are highly available, performant, and ready for disaster recovery.

• Strengthen the organization's cloud security posture by effectively managing IAM, network security controls, secrets management, and adhering to compliance best practices.

• Improve platform observability and operational excellence by implementing and enhancing monitoring, logging, alerting, and performance analytics using tools like CloudWatch, Prometheus, and Grafana.

• Take charge of production incidents by engaging in on-call rotations, leading troubleshooting efforts, conducting root cause analysis, and fostering continuous improvement initiatives.

• Collaborate closely with software engineering, DevOps, and platform teams to enhance deployment processes, application reliability, and operational efficiency.

• Identify and execute cloud cost optimization opportunities through resource right-sizing, capacity planning, automation, and governance best practices.


⛳️ Requirements

• 4–5 years of experience in a cloud operation, infrastructure engineering, or SRE role with a strong hands-on technical emphasis.

• Extensive hands-on experience with core AWS services: EC2, EKS, ECS, RDS/Aurora, S3, VPC, IAM, Lambda, CloudWatch, Route 53, and ALB/NLB.

• Demonstrated ability to design and troubleshoot complex AWS networking architectures (VPCs, subnets, transit gateways, security groups).

• Strong understanding of AWS IAM, including roles, policies, permission boundaries, and cross-account access.

• Hands-on production experience managing workloads on Amazon EKS and ECS, including cluster lifecycle, node group management, networking (CNI, service mesh basics), and autoscaling.

• Fundamental knowledge of Docker: image builds, registries (ECR), multi-stage builds, and container security.

• Strong Linux administration skills, including Bash/Python scripting, process and memory management, filesystem and storage operations, kernel parameters, and network diagnostics.

• Experience in managing and hardening Linux servers in production environments (RHEL, Ubuntu, or Amazon Linux).

• Proficient in Terraform, including module design, state management, remote backends, and workspace strategies.

• Practical experience with Puppet for configuration management, node classification, and enforcing system state at scale.

• Hands-on experience with relational databases such as PostgreSQL, MySQL, or AWS RDS/Aurora, including schema management, query optimization, replication, backups, and failover.

• Familiarity with NoSQL databases like DynamoDB, Redis, or MongoDB, including data modeling, performance tuning, and operational monitoring.

• Understanding of CI/CD pipelines using tools such as GitHub Actions, Jenkins, or AWS CodePipeline.

• Experience with observability tools, including CloudWatch, Datadog, Prometheus, or Grafana.


🏝️ Benefits

• Flexible working arrangements.

• Professional development opportunities.

People also viewed

Ad Hoc LLC1 day ago

Senior Site Reliability Engineer

North AmericaFull-timeDevOps & Site Reliability Engineer (SRE)$135k – $150k/year
ApplyView job
Acuity, Inc.2 days ago

Senior DevOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$135k – $150k/year
ApplyView job
Grafana Labs3 days ago

Staff Software Engineer – Databases SRE

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)€109.7k – €131.7k/year
ApplyView job
Castillians4 days ago

DevOps Engineer

IE flagIreland OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
CodiLime5 days ago

Senior DevOps Engineer

EG flagEgypt OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job
Pathbit5 days ago

Senior DevOps/DevSecOps

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers