
Site Reliability Engineer
Posted May 21

Posted May 21
This is a fully remote position, open to applicants in India.
• Utilize various monitoring and alerting tools to address complex programming challenges at scale.
• Oversee and optimize multiple essential customer-facing Apache Pinot clusters.
• Track availability, read/write latencies, and other important telemetry to proactively detect SLO violations and assist in issue resolution.
• Establish a strong relationship with customers to effectively mitigate and resolve incidents.
• Implement disaster recovery plans with minimal downtime.
• Work collaboratively with other engineers to understand and troubleshoot systems, using the insights gained to shape the roadmap of other teams.
• Over 5 years of experience in an engineering role (SRE, SDET, or development).
• Background in managing highly available production-facing distributed systems, along with a solid understanding of Java, is advantageous.
• Familiarity with cloud platforms such as AWS, GCP, or Azure.
• Experience with Kubernetes and container orchestration technologies.
• Knowledge of streaming systems like Kafka, Pulsar, Flume, Flink, Spark, or similar technologies.
• Understanding of best practices related to security, performance, and disaster recovery.
• Excellent troubleshooting and critical thinking abilities.
• Health insurance.
• Flexible work arrangements.
• Opportunities for professional development.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.