Remotery

T3 Operations & Support Specialist – Compute & OS

Posted 2 days ago

This is a fully remote position, open to applicants in Germany.

📋 Description

• Delivering T3 operational leadership for Compute & OS services: managing intricate incidents, troubleshooting, conducting root cause analysis (RCA), and driving sustainable solutions and preventive actions.

• Ensuring readiness of compute/OS for releases and modifications: overseeing monitoring/alerting frameworks, establishing performance baselines, enhancing security, developing patch strategies, rollback and recovery processes, and preparing runbooks.

• Executing and refining standard operational practices through automation to minimize manual effort and enhance Mean Time to Recovery (MTTR) and system stability.

• Collaborating with Kubernetes, Data, Network, and Storage Subject Matter Experts (SMEs) to address cross-domain production challenges.

• Assessing deployment artifacts from an operational standpoint and enforcing quality assurance protocols.

• Monitoring system health, performance indicators, and service availability across multi-tenant environments.

• Identifying, analyzing, and resolving incidents to reduce service interruptions while initiating RCA and corrective measures.

• Establishing monitoring and logging frameworks to meet audit and compliance prerequisites.

• Conducting regular security assessments and addressing identified vulnerabilities.


⛳️ Requirements

• 5 to 10+ years of experience in IT operations, service delivery, or platform operations.

• Demonstrated expertise in implementing and leading Incident, Problem, Change, and Release governance in a production environment.

• Practical experience with VMware 8 virtualization.

• Proficiency in Operating Systems: Red Hat Enterprise Linux and Ubuntu.

• Familiarity with OS tools: Satellite, IPA, Certificate Server.

• Experience with ITSM and collaboration tools: Jira Service Management, Jira, Confluence.

• Solid understanding of core operational processes (Incident, Change, Problem management, ITSM) and Site Reliability Engineering (SRE) principles.

• Experience in extracting operational insights from monitoring/observability, including management of SLI/SLA/SLO and performance tracking.

• Practical experience in documenting procedures and enforcing clear runbooks and playbooks.

• Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, Mimir, Loki).

• Understanding of modern platform operations (Kubernetes/containers, automation, observability) sufficient to oversee specialists.

• Proficient in English and German (C1 level minimum in both languages).


🏝️ Benefits

• Flexible working hours.

• Autonomy in selecting projects.

• Opportunity to engage in exciting projects across various industries.

• Support for career advancement.

• Competitive compensation.

• Dedicated team available for assistance.

People also viewed

pathway solutions10 hours ago

Steuerfachangestellter, Kundensupport, Buchhaltung SaaS, Fully Remote

DE flagGermany OnlyFull-timeCustomer Support
ApplyView job
Mercuryo10 hours ago

Head of Customer Support

ES flagSpain OnlyFull-timeCustomer Support
ApplyView job
Webflow10 hours ago

Customer Service Representative

PH flagPhilippines OnlyPart-timeCustomer Support$950/month
ApplyView job
Conduent10 hours ago

Klantenservice Medewerker

ES flagSpain OnlyPart-timeCustomer Support€1,900/month
ApplyView job
IntellectEU10 hours ago

Product Delivery Manager – Customer Experience

PT flagPortugal OnlyFull-timeCustomer Support
ApplyView job
NightOwl Consulting11 hours ago

Loan Servicing Support Specialist

PH flagPhilippines OnlyFull-timeCustomer Support
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers