
Staff SRE Engineer
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in Taiwan.
• Oversee and sustain container orchestration platforms as well as containerized workloads.
• Monitor and resolve issues in production systems, taking part in on-call rotations to ensure system reliability.
• Propel improvements in observability by upgrading monitoring, logging, and alerting functionalities across systems and data platforms.
• Manage and enhance cloud-based environments across various service providers.
• Support and administer distributed data platforms and real-time processing systems.
• Create and uphold continuous integration and delivery pipelines for efficient and dependable deployments.
• Lead and apply Infrastructure as Code (IaC) practices to guarantee consistency and scalability.
• Automate and orchestrate infrastructure using programming and scripting languages.
• Conduct system administration and networking tasks to aid both internal and external environments.
• Collaborate effectively with engineers and stakeholders across different time zones.
• Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles.
• Proven track record in managing large-scale production systems within cloud environments (AWS, GCP, Azure, or OCI).
• Demonstrated leadership in directing incident response, establishing on-call best practices, and fostering a reliability-focused culture.
• Strong background in production on-call operations and incident management.
• Advanced skills in Kubernetes administration and troubleshooting.
• Practical experience with observability tools such as Prometheus, Grafana, Loki, and Alertmanager.
• Familiarity with chat-based operational interfaces and/or auto-remediation controllers using AI agentic frameworks.
• Understanding of AI agents for auto-triaging alerts, correlating signals, and suggesting/root-cause hypotheses.
• Expertise in managing data platforms including Elasticsearch, MongoDB, Spark, Kafka, and Redis.
• Proficiency in public cloud services (AWS, Azure, GCP, or OCI).
• Strong programming and automation capabilities in Python and Bash.
• In-depth knowledge of Infrastructure as Code tools (Terraform, Helm).
• Experience with CI/CD pipelines (GitHub Actions, Bitbucket, ArgoCD).
• Solid technical foundation in distributed systems, databases, networking, and Linux administration.
• Excellent problem-solving, communication, and leadership skills.
• Bachelor’s degree in Computer Science, Engineering, or a related technical discipline.
• Certifications in AWS, GCP, Observability, Linux, or Kubernetes are advantageous.
• Competitive salary and performance-based incentives.
• Comprehensive health and wellness benefits.
• Opportunities for professional development and continuous learning.
• Flexible work hours and remote work options.
• Supportive and inclusive company culture.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.