
T3 Operations & Support Specialist – Storage
Posted 2 days ago

Posted 2 days ago
This is a fully remote position, open to applicants in Germany.
• Taking T3 operational ownership of Storage services, including managing complex incidents, performing in-depth troubleshooting and root cause analysis (RCA), and driving long-term solutions and preventive actions.
• Ensuring readiness for storage changes by overseeing monitoring and alerting coverage, establishing performance baselines, hardening systems, developing patch strategies, and creating rollback and recovery procedures, along with runbooks.
• Executing and enhancing standard operational procedures through automation, covering capacity checks, validation processes, and provisioning workflows.
• Validating deployment artifacts from an operational standpoint and enforcing quality assurance protocols.
• Monitoring system health, performance metrics, and service availability across multi-tenant environments.
• Identifying, analyzing, and resolving incidents to minimize service disruptions while initiating RCA and corrective actions.
• Implementing monitoring and logging strategies to meet audit and compliance requirements.
• Conducting routine security scans and addressing identified vulnerabilities.
• Over 5 years of experience in IT storage operations, service delivery, or platform operations, demonstrating leadership in critical environments.
• Proven track record in implementing and managing Incident, Problem, Change, and Release governance in production settings.
• Experience in supporting platform workloads that depend on shared storage services.
• Familiarity with storage types including File, Block, and Object Storage using NetApp (ONTAP).
• Knowledge of protocols and services such as NFS and object storage operations (S3-like concepts).
• Experience with Kubernetes storage integration, specifically with CSI driver concepts and troubleshooting related to PV/PVC lifecycles.
• Experience operating storage virtualization in enterprise settings.
• Proficiency with ITSM and collaboration tools including Jira Service Management, Jira, and Confluence.
• Strong understanding of core operations processes (Incident, Change, Problem management, ITSM) and Site Reliability Engineering (SRE) concepts.
• Experience in gathering operational insights from monitoring and observability, including management and tracking of SLI/SLA/SLO.
• Practical experience in documenting procedures and enforcing clear runbooks and playbooks.
• Hands-on experience with monitoring and logging tools such as Prometheus, Grafana, Datadog, Mimir, and Loki.
• Understanding of modern platform operations (Kubernetes/containers, automation, observability) sufficient to oversee specialists.
• Fluency in English and German (minimum C1 level in both languages).
• Flexible working hours.
• The autonomy to select your own projects.
• Access to engaging projects across various industries.
• Competitive salary.
• A dedicated team available to assist with any inquiries you may have.
• Opportunity to work independently while leveraging our extensive network to reach your professional objectives.
pathway solutions
Webflow
Conduent
Get handpicked remote jobs straight to your inbox weekly.