This is a fully remote position, open to applicants in New York.

📋 Description

• Assess and enhance the organization's disaster recovery capabilities, including recovery time objectives (RTO/RPO), dependency mapping, and failure domain analysis across applications, data, and infrastructure.

• Create, document, and implement disaster recovery standards and best practices for cloud infrastructure, platforms, and application architectures.

• Collaborate with SRE, platform, security, and product engineering teams to design and build resilient, fault-tolerant systems, transitioning from backup-based recovery to multi-region and active-active architectures.

• Direct the disaster recovery roadmap, weighing technical feasibility, cost, risk, and business priorities.

• Design and propose reference architectures for various disaster recovery patterns, including pilot-light, warm standby, hot standby, and active-active configurations.

• Promote the adoption of active-active disaster recovery for key systems, including traffic management, data replication, consistency models, and automated failover.

• Define and implement testing strategies for disaster recovery, such as game days, chaos testing, and routine recovery exercises.

• Ensure comprehensive documentation, runbooks, and escalation procedures are in place to guarantee that recoverability is well understood and not reliant on specific individuals.

• Assess and recommend platform upgrades, cloud services, and tools that enhance resilience, recovery speed, and reliability.

• Act as a technical authority and advisor on disaster recovery and resilience for leadership and engineering teams.

• Provide architectural guidance, conduct design reviews, and mentor engineers executing disaster recovery-related changes.

• Collaborate with security and compliance teams to confirm that disaster recovery strategies adhere to regulatory, audit, and data protection standards.

⛳️ Requirements

• Bachelor’s or Master’s degree in Computer Science or equivalent practical experience.

• Over 8 years of experience in cloud infrastructure, platform engineering, SRE, or reliability-focused architecture roles.

• In-depth knowledge of disaster recovery principles, including RTO/RPO, blast radius reduction, failure domains, and dependency isolation.

• Demonstrated experience in designing and implementing multi-region and multi-availability zone architectures.

• Practical experience transitioning systems to active-active or highly available architectures.

• Strong understanding of data replication strategies, consistency trade-offs, and recovery patterns for databases and stateful systems.

• Extensive experience with major cloud providers (AWS preferred, GCP/Azure acceptable).

• Solid understanding of managed cloud services and their disaster recovery characteristics and limitations.

• Familiarity with Kubernetes-based platforms, including regional failover, workload portability, and cluster recovery strategies.

• Knowledge of global traffic management, DNS, load balancing, and service mesh patterns.

• Experience in designing and maintaining Infrastructure as Code using tools like Terraform, Pulumi, CloudFormation, or Ansible.

• Strong emphasis on automating recovery workflows, failover testing, and environment provisioning.

• Ability to eliminate manual recovery processes and minimize recovery time through software solutions.

• Experience in defining and conducting disaster recovery tests, game days, and failure simulations.

• Comfortable collaborating across organizational boundaries to influence priorities and standards.

• Excellent documentation and communication skills, with the ability to convey complex technical risks into business impacts.

🏝️ Benefits

• Health insurance

• Remote work flexibility

• Professional development

• Paid time off

Senior Cloud Resilience Architect

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Architectural Services Consultant

Business Architect

Architect

Independent Interior Decorator/Interior Architect

Azure Databricks Architect

Data 360 Enablement Architect

Never miss a great job!