This is a fully remote position, open to applicants in India.

• Deploy, oversee, and scale distributed platforms across various geographic locations

• Design and sustain Kubernetes-based infrastructure for extensive applications

• Create and manage Helm charts to facilitate efficient and repeatable deployments

• Monitor system health utilizing Grafana dashboards and metrics; proactively identify and resolve issues

• Enhance system reliability, performance, and scalability through automation and adherence to best practices

• Manage large-scale deployments and enhance infrastructure to support growth

• Collaborate with development teams to ensure seamless CI/CD processes and production readiness

• Implement observability, alerting, and incident response protocols

• Diagnose production issues and conduct root cause analysis

• Document and maintain run books for incident response

• 4–5 years of experience in Site Reliability Engineering, DevOps, or related fields

• Extensive hands-on experience with Kubernetes in production settings

• Proficient experience with infrastructure as code (Terraform, git, etc.)

• Strong expertise in AWS (EKS, VPC, S3, ECR, IAM roles, etc.)

• Solid experience with Helm charts for application deployment

• Proficient in bash scripting and tooling

• Experience with large-scale distributed systems and high-availability architectures

• Strong understanding of containerization, microservices, and cloud-native ecosystems

• Familiarity with CI/CD pipelines and automation tools

• Strong debugging and problem-solving skills in production environments

• Competitive compensation

• Comprehensive benefits

• Career success on your terms

• Flexible work environment

• Annual wellness and community outreach days

• Continuous recognition for your contributions

• Global collaboration and networking opportunities

Site Reliability Engineer III

People also viewed