This is a fully remote position, open to applicants in India.

📋 Description

• Design, construct, and sustain scalable and dependable systems on GCP (Compute Engine, GKE, Cloud Storage, Cloud SQL)

• Develop automation for infrastructure provisioning utilizing Terraform, Ansible, or Deployment Manager

• Establish and manage observability platforms (monitoring, logging, tracing) using tools like Stackdriver (Cloud Monitoring), Prometheus, or Grafana

• Oversee incident response, carry out postmortems, and implement enhancements to mitigate recurrence

• Collaborate with DevOps and engineering teams to improve CI/CD pipelines for robust deployments

• Define and track SLAs, SLOs, and SLIs to ensure application availability and performance

• Implement disaster recovery (DR) and backup strategies across cloud services

• Continuously refine performance, capacity, and cost-efficiency of GCP resources

⛳️ Requirements

• Bachelor's degree in Computer Science, Engineering, or a related discipline

• Over 3 years of practical experience as a Site Reliability Engineer, DevOps Engineer, Systems Engineer, or Cloud Infrastructure Engineer, with a proven history of managing production-grade systems on Google Cloud Platform (GCP) or other cloud environments

• Solid understanding of Linux/Unix system administration, networking, and troubleshooting

• Experience in implementing Infrastructure as Code (IaC) with tools such as Terraform, Ansible, or Deployment Manager

• Familiarity with containerization and orchestration technologies like Docker and Kubernetes (GKE)

• Proficient with monitoring and observability tools (Google Cloud Operations Suite, Prometheus, Grafana, Datadog, ELK)

• Experience in defining and monitoring SLAs, SLOs, and SLIs to ensure application uptime and performance

• Demonstrated ability to manage incident response, conduct postmortems, and perform root cause analysis

• Proficiency in at least one scripting language (Python, Bash, or Go) for automation and tooling, along with hands-on experience in building or managing CI/CD pipelines (Jenkins, GitLab CI, Cloud Build). Strong background in configuration management and release automation

• Knowledge of IAM (Identity and Access Management), network security, and cloud compliance controls, alongside familiarity with disaster recovery (DR), backups, and high-availability design

• High-level proficiency in written and spoken English communication

🏝️ Benefits

• Comprehensive and affordable medical, dental, vision, and life insurance options

• Competitive Provident Fund contributions

• Paid time off and holidays

• Mental health support and wellbeing program

• Company-provided equipment and a one-time $250 USD work from home stipend

• $750 USD annual professional development budget

• Company rewards and recognition program

• And more!

Site Reliability Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

DevOps confirmé

DevOps Engineer, Cloud

Senior Site Reliability Engineer

Analista de Infraestrutura, SRE, DevOps

Senior Site Reliability Engineer

Staff Database Reliability Engineer, DBRE

Never miss a great job!