This is a fully remote position, open to applicants in Malaysia.

📋 Description

• Oversee, sustain, and enhance the reliability, availability, and performance of production systems and services.

• Develop and uphold infrastructure as code (IaC), deployment pipelines, and automation to facilitate continuous delivery, scalability, and disaster recovery.

• Address incidents, conduct root-cause analyses, and lead postmortems to ensure that lessons learned are implemented.

• Apply and enforce operational best practices: observability, logging, metrics, alerting, capacity planning, failover strategies, and backups.

• Collaborate with Engineering, Product, Compliance, and Operations teams to guarantee that infrastructure adheres to reliability, compliance, and security standards.

• Assist with service scaling, database operations, cloud infrastructure (preferably GCP), networking, and microservices orchestration.

• Create documentation for operational runbooks, on-call procedures, and system architecture to aid maintenance, knowledge sharing, and compliance.

⛳️ Requirements

• Proficient programming or scripting abilities (Go, Python, Bash, or similar) for automation, tooling, and operational tasks.

• Practical experience with cloud infrastructure, particularly Google Cloud Platform (GCP).

• Knowledge of containerization and orchestration (Docker, Kubernetes, or equivalent).

• Experience with infrastructure-as-code tools (Terraform, Cloud Deployment Manager, or similar).

• Familiarity with either FluxCD or ArgoCD for GitOps-based delivery.

• Strong grasp of distributed systems, microservices architecture, and reliability patterns.

• Experience in setting up monitoring, logging, alerting, and observability (e.g., Prometheus, Grafana, ELK, distributed tracing).

• Excellent troubleshooting skills and the ability to respond to incidents under pressure.

• Understanding of backup and disaster recovery strategies, database management, and secure operations.

• Ownership mindset: proactive, responsible, and dedicated to system reliability.

• Strong communication skills — capable of coordinating with both technical and non-technical stakeholders.

• Comfortable working in a fast-paced, early-stage startup environment.

• High integrity, attention to detail, and a passion for fintech and programmable banking systems.

🏝️ Benefits

• Competitive salary and meaningful equity with opportunities for growth.

Senior Site Reliability Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

DevOps Reliability Engineer

Senior Site Reliability Engineer – Network

Staff Site Reliability Engineer

DevOps Engineer, Mid Level

DevOps Engineer, Azure

DevOps Engineer, mk8s

Never miss a great job!