
Senior Site Reliability Engineer
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in United States.
• Design, implement, and maintain highly available and scalable infrastructure solutions for our projects, products, and customers.
• Monitor and analyze system performance, identifying and resolving bottlenecks and issues to ensure optimal performance and reliability.
• Automate infrastructure deployment and configuration management processes.
• Continuously enhance system reliability, security, and efficiency through proactive monitoring, capacity planning, and performance tuning.
• Troubleshoot and resolve complex infrastructure and application issues in production and testing environments.
• Collaborate with software engineering teams to design and implement systems that are resilient, scalable, and secure.
• Participate in on-call rotations and respond to production incidents promptly.
• Document system configurations, troubleshooting procedures, and operational guidelines.
• Proven experience as a Site Reliability Engineer or in a comparable role.
• Strong understanding of networking, operating systems, and cloud infrastructure.
• Experience in Site Reliability Engineering, System Design, and Distributed Computing.
• Proficiency in various programming languages — we currently have SDKs for NodeJS, Java, Python, Ruby, and Go.
• Experience with containerization technologies such as Docker and Kubernetes.
• Knowledge of infrastructure-as-code tools like Terraform and Pulumi.
• Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
• Experience with lower-level implementation details of relational databases (bonus if you have experience with distributed SQL databases like Google Cloud Spanner or CockroachDB).
• Experience working with Git and GitHub.
• Familiarity with continuous integration and deployment systems.
• Strong problem-solving and troubleshooting skills.
• Excellent communication and collaboration abilities.
• Competitive salary based on experience.
• Stock options at an early-stage startup.
• Comprehensive benefits including healthcare (US-based) and other insurance.
• A fully remote and flexible schedule to accommodate different time zones.
• Twice-yearly travel for team offsites focused on team bonding, collaboration, and having fun!
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.