Remotery

Cloud Reliability Engineer – Recovery

Posted May 22

This is a fully remote position, open to applicants in India.

📋 Description

• Design and execute AWS architectures spanning multiple regions and availability zones that fulfill RTO/RPO objectives.

• Engineer failover patterns, both active-active and active-passive, utilizing Route 53, Global Accelerator, and CloudFront.

• Create automated disaster recovery runbooks and playbooks via AWS Systems Manager Automation and Step Functions.

• Implement chaos engineering techniques using AWS Fault Injection Simulator (FIS) to assess and validate system resiliency.

• Architect strategies for cross-region data replication across S3, DynamoDB Global Tables, RDS, and Aurora Global.

• Evaluate containerized workloads within Kubernetes, ensuring resilience through self-healing mechanisms, auto-scaling, and deployments across multiple clusters or regions.

• Manage AWS Backup for all services (EC2, EBS, RDS, EFS, FSx, DynamoDB, Aurora) through policy-based automation.

• Design immutable backup vaults and pipelines for backup replication across accounts and regions.

• Develop and automate procedures for data recovery testing, ensuring data integrity and compliance with defined service level agreements (SLAs).

• Implement point-in-time recovery (PITR) for databases and storage, validating through regular restore drills.

• Maintain Business Continuity Plans (BCP) and Disaster Recovery (DR) strategies, including the monitoring of RTO (Recovery Time Objective) and RPO (Recovery Point Objective).


⛳️ Requirements

• Minimum of 5 years in cloud infrastructure, Site Reliability Engineering (SRE), or IT disaster recovery engineering roles.

• At least 3 years of practical AWS experience in large-scale production environments.

• Demonstrated success in delivering multi-region disaster recovery architectures with clearly defined and tested RTO/RPO targets.

• Expert-level knowledge of core AWS resilience services (refer to the skills matrix below).

• Strong proficiency in scripting languages such as Python, Bash, or PowerShell for automation and orchestration tasks.

• Experience with Infrastructure as Code tools: Terraform and/or AWS CloudFormation.

• Comprehensive understanding of networking fundamentals, including VPC, Transit Gateway (TGW), Direct Connect, VPN, and DNS failover mechanisms.

• Exceptional written and verbal communication skills, capable of creating executive-level disaster recovery reports.


🏝️ Benefits

• Health insurance

• Retirement plans

• Paid time off

• Flexible work arrangements

• Professional development opportunities

People also viewed

Akka (formerly Lightbend)11 hours ago

Forward Deployed Engineer

DE flagGermany OnlyFull-timeEngineer
ApplyView job
Swimlane1 day ago

Professional Services Engineer

IN flagIndia OnlyFull-timeEngineer$120k – $160k/year
ApplyView job
ITTConnect1 day ago

Senior Cisco CUCM Engineer

BR flagBrazil OnlyFull-timeEngineer
ApplyView job
Logicalis Spain1 day ago

Ingeniero de Observabilidad IA

ES flagSpain OnlyFull-timeEngineer
ApplyView job
Ohmium2 days ago

Field Services Engineer

HR flagCroatia OnlyFull-timeEngineer
ApplyView job
DeepHealth2 days ago

Technical Services Engineer

NL flagNetherlands OnlyFull-timeEngineer€35k – €50k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers