
Senior Site Reliability Engineer
Posted Jun 12

Posted Jun 12
This is a fully remote position, open to applicants in Philippines.
• Ensure the reliability and availability of the platform across both production and pre-production environments through proactive monitoring, alerting, and automation.
• Act as the first responder for incidents and contribute to problem management and root cause analysis.
• Support the development team's initiatives towards reliability, fostering a strong reliability culture throughout the development lifecycle.
• Create troubleshooting documentation for production support resources.
• Collaborate with engineering teams to produce optimized and productive runbooks, operational documentation, and the automation of operational tasks.
• Work alongside development and cloud engineering teams to integrate reliability and performance into the software delivery lifecycle.
• Design, implement, and enhance observability solutions (metrics, logs, traces, dashboards) utilizing tools such as Prometheus, Grafana, and ELK.
• Participate in on-call rotations and continuously refine alert quality and response processes.
• Promote a culture of reliability, performance, and continuous improvement across teams.
• Bachelor's Degree or Master's in Engineering or a related field.
• Experience in managing at least one container orchestration cluster (Kubernetes, Docker Swarm).
• Proven experience in developing or maintaining software for production services at scale.
• Familiarity with ELK.
• Experience with AWS.
• Knowledge of the Grafana/Prometheus stack.
• Strong scripting abilities (Bash, Python, or Go).
• Excellent communication skills.
• Ability to think creatively and anticipate challenges. It is crucial to be proactive rather than reactive; we must foresee challenges and critically evaluate existing technologies, procedures, and mindsets. Continuous review and questioning at all levels are expected.
• Versatility is essential. We employ agile/lean methodologies and prefer to iterate and learn rather than assume we have all the answers.
• A team player mentality is vital. You will not always work in isolation and should be enthusiastic about collaborating with product, experience design, engineering, and more.
• **Considered a plus:**
• - Telephony knowledge (SIP, VoIP);
• - Experience in Linux Administration (RedHat, CentOS, AL);
• - Working knowledge of Configuration Management tools (Terraform, Ansible);
• - Understanding of TCP/IP and general networking concepts;
• - RDBMS knowledge (MySQL, Postgres);
• - NoSQL knowledge (Redis).
• Competitive fixed compensation;
• Long-term employment with vacation days;
• Opportunities for professional development (courses, training, etc);
• Be part of innovative technology products that have a global impact on the service industry;
• Work alongside skilled and enjoyable colleagues;
• Access to Apple gear.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.