Remotery

Customer Site Reliability Engineer – OpenShift Managed Cloud Services, Spoken Japanese, Kubernetes/AWS/Azure, Linux

Posted 6 days ago

This is a fully remote position, open to applicants in Australia.

📋 Description

• Oversee large-scale, distributed systems with a focus on reducing downtime and enhancing system resilience.

• Uphold customer trust and confidence by ensuring the stability and functionality of services.

• Propel ongoing improvements in processes, tools, and methodologies to meet the evolving needs of the service.

• Spearhead the creation of code and automation scripts aimed at optimizing the scalability, reliability, and performance of services.

• Take charge of and engage in high-priority customer escalations, adopting a customer-centric approach.

• Organize and carry out complex incident response procedures, ensuring prompt resolution and comprehensive postmortems.

• Collaborate with cross-functional teams to bolster system robustness.

• Exhibit a proactive attitude to preempt escalations and ensure dependable operations.

• Record resolutions, root causes, and best practices to enhance the knowledge base and promote self-service solutions.

• Guide and mentor team members, nurturing a culture of continuous learning, knowledge sharing, and collaboration.

• Participate in the on-call rotation and provide leadership during critical incidents.

• Collaborate on strategic AI and automation initiatives aimed at improving the efficiency of fleet operations and troubleshooting, ultimately delivering an enhanced product experience for customers.


⛳️ Requirements

• Advanced experience with OpenShift/Kubernetes for container platform support or administration.

• Proficient in container-based technologies operating on Linux.

• Skilled in managing Linux-based systems within public cloud environments such as AWS, Azure, or GCP.

• Advanced experience with enterprise systems monitoring; knowledge of Prometheus is preferred.

• Advanced proficiency with enterprise configuration management tools such as Ansible and Terraform.

• Software engineering experience using object-oriented programming languages; golang is preferred.

• Excellent communication skills with experience in direct customer interaction and presentations.

• Ability to rapidly learn new technologies and stay updated with industry trends.

• Proven capability to quickly and accurately diagnose systems issues.

• Strong understanding of standard TCP/IP networking and common protocols.

• Proficient in English, with additional languages such as Japanese, Chinese, Korean, or Spanish being an advantage.


🏝️ Benefits

• Health insurance

• Flexible working hours

• Professional development opportunities

People also viewed

Advanced Solutions International, Inc.12 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone12 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers