This is a fully remote position, open to applicants in Germany.

📋 Description

• Taking T3 operational ownership of Storage services, including managing complex incidents, performing in-depth troubleshooting and root cause analysis (RCA), and driving long-term solutions and preventive actions.

• Ensuring readiness for storage changes by overseeing monitoring and alerting coverage, establishing performance baselines, hardening systems, developing patch strategies, and creating rollback and recovery procedures, along with runbooks.

• Executing and enhancing standard operational procedures through automation, covering capacity checks, validation processes, and provisioning workflows.

• Validating deployment artifacts from an operational standpoint and enforcing quality assurance protocols.

• Monitoring system health, performance metrics, and service availability across multi-tenant environments.

• Identifying, analyzing, and resolving incidents to minimize service disruptions while initiating RCA and corrective actions.

• Implementing monitoring and logging strategies to meet audit and compliance requirements.

• Conducting routine security scans and addressing identified vulnerabilities.

⛳️ Requirements

• Over 5 years of experience in IT storage operations, service delivery, or platform operations, demonstrating leadership in critical environments.

• Proven track record in implementing and managing Incident, Problem, Change, and Release governance in production settings.

• Experience in supporting platform workloads that depend on shared storage services.

• Familiarity with storage types including File, Block, and Object Storage using NetApp (ONTAP).

• Knowledge of protocols and services such as NFS and object storage operations (S3-like concepts).

• Experience with Kubernetes storage integration, specifically with CSI driver concepts and troubleshooting related to PV/PVC lifecycles.

• Experience operating storage virtualization in enterprise settings.

• Proficiency with ITSM and collaboration tools including Jira Service Management, Jira, and Confluence.

• Strong understanding of core operations processes (Incident, Change, Problem management, ITSM) and Site Reliability Engineering (SRE) concepts.

• Experience in gathering operational insights from monitoring and observability, including management and tracking of SLI/SLA/SLO.

• Practical experience in documenting procedures and enforcing clear runbooks and playbooks.

• Hands-on experience with monitoring and logging tools such as Prometheus, Grafana, Datadog, Mimir, and Loki.

• Understanding of modern platform operations (Kubernetes/containers, automation, observability) sufficient to oversee specialists.

• Fluency in English and German (minimum C1 level in both languages).

🏝️ Benefits

• Flexible working hours.

• The autonomy to select your own projects.

• Access to engaging projects across various industries.

• Competitive salary.

• A dedicated team available to assist with any inquiries you may have.

• Opportunity to work independently while leveraging our extensive network to reach your professional objectives.

T3 Operations & Support Specialist – Storage

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Steuerfachangestellter, Kundensupport, Buchhaltung SaaS, Fully Remote

Head of Customer Support

Customer Service Representative

Klantenservice Medewerker

Product Delivery Manager – Customer Experience

Loan Servicing Support Specialist

Never miss a great job!