
Staff Replication Development Engineer
Posted 3 days ago

Posted 3 days ago
• Take the lead in the design and development of the replication engine for the Infinia AI Data Platform.
• Concentrate on creating enterprise-level asynchronous replication features that facilitate reliable and secure disaster recovery for extensive data systems.
• Construct high-performance replication pipelines, efficient data synchronization methods, and secure data transfer systems.
• Design and develop multi-threaded asynchronous replication systems with capabilities for parallel streaming.
• Create object-level delta replication with checkpointing and resume functionalities.
• Develop replication engines that support bucket/share-level replication controls.
• Implement secure data transfer mechanisms utilizing TLS 1.3 with mutual authentication.
• Ensure comprehensive data integrity through checksum validation and verification processes.
• Design and implement manual failover workflows tailored for disaster recovery situations.
• Build and maintain REST APIs for replication configuration, management, and automation.
• Develop systems for metadata tracking and change detection to facilitate efficient replication.
• Implement RPO visibility, alerting, and operational insights concerning replication status.
• Contribute to monitoring dashboards that emphasize replication health and performance.
• Ensure that systems are designed for high availability, fault tolerance, and scalability.
• Collaborate with QA teams to enhance performance, resiliency, and scale validation.
• Work alongside backend, security, and platform teams to deliver comprehensive replication workflows.
• Engage in debugging, resolving production issues, and continuously improving replication reliability.
• Provide technical leadership, architectural direction, and mentorship to the engineering team.
• A minimum of 8 years of experience in distributed systems, storage systems, or backend software engineering.
• Proficient programming skills in one or more languages: C++, Go, Java, or Rust.
• Experience in designing and building data replication systems, data pipelines, or distributed data services.
• In-depth knowledge of distributed systems concepts, including consistency, availability, scalability, and fault tolerance.
• Strong expertise in multi-threading, concurrency, and parallel processing.
• Familiarity with networking protocols and secure communication (TCP/IP, HTTP/HTTPS, TLS).
• Experience in implementing data integrity mechanisms such as checksums, validation, and consistency checks.
• Proficient in designing and building REST APIs and service-oriented architectures.
• Understanding of checkpointing, failure recovery, and retry mechanisms in distributed systems.
• Basic knowledge of observability concepts, including metrics, logging, and alerting.
• Strong debugging, problem-solving, and system design abilities.
• A dynamic and driven team environment.
• An opportunity for hands-on involvement.
• Engineering excellence is central to everything we do.
Get handpicked remote jobs straight to your inbox weekly.