
Senior Software Engineer, Platform & Infrastructure
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in Germany.
• Take ownership and enhance our Kubernetes platform across various clusters: manage Helm chart deployments via an OCI registry hosting over 40 charts, enforce policy-as-code using Kyverno, and implement GitOps workflows through Argo CD ApplicationSets with progressive delivery orchestrated by Kargo.
• Lead technical designs for platform projects: define the issue, propose several solutions, evaluate their trade-offs, and advocate for your recommendation — then ensure successful deployment to production.
• Strengthen the platform's security framework: implement workload identity through OIDC, enhance runtime security, perform image scanning, and manage secrets.
• Create and maintain custom Kubernetes operators and internal tools in Go and Python to amplify the team's efficiency across clusters — we utilize Zalando postgres-operator alongside our custom operators.
• Sustain and enhance our observability stack (Prometheus, Grafana, Thanos, OpenSearch): develop dashboards and alerts that provide product teams with clear visibility into their services.
• Ensure GitLab CI/CD pipelines are swift and dependable — facilitating around 150 production deployments monthly; manage cross-team changes (rollouts, database migrations, certificate rotations) with attention and clear communication.
• Manage and evolve AWS infrastructure (EKS, VPC, IAM, RDS, S3) including dedicated customer environments in their respective AWS accounts; lead cost-efficiency initiatives monitored via OpenCost.
• Handle incidents from start to finish: from alert to resolution to postmortem analysis.
• Elevate the team's standards through comprehensive code and architecture reviews, mentor junior engineers, and assist in evaluating technical candidates during interviews.
• Serve as the infrastructure liaison for product teams when issues arise or clarity is needed.
• Engage in a shared on-call rotation.
• Over 5 years of professional engineering experience, with at least 3 years focused on infrastructure, platform, or site reliability engineering.
• Extensive hands-on experience with Kubernetes: cluster management, workload administration, and troubleshooting at scale.
• Proficient in Helm chart creation: writing, packaging, and maintaining charts — not merely utilizing them.
• Familiarity with GitOps methodologies using Argo CD or similar tools (Flux, etc.).
• Knowledge of AWS services (EKS and associated services like IAM, VPC, RDS, S3).
• Experience with Infrastructure as Code (Terraform or equivalent tools).
• Proficiency in Go or Python — our custom operators and internal tools are developed in both languages.
• Proven experience managing production incidents from start to finish — including response, mitigation, and postmortem processes.
• Excellent English communication skills, capable of conveying technical decisions to both engineering and non-technical stakeholders.
• Flexible working arrangements (remote, office, or hybrid).
• Modern office located in the heart of Hanover for hybrid work.
• Up to 180 days (6 months) of remote work from abroad.
• Competitive compensation along with benefits offering 30 days (6 weeks) of paid vacation.
• Access to modern hardware and software solutions.
Webedia
TechBiz Global
The Flex
Nodeworthy
Get handpicked remote jobs straight to your inbox weekly.