
Senior IaaS / Kubernetes Platform Engineer
Posted Jun 12

Posted Jun 12
This is a fully remote position, open to applicants in Poland.
• Design, develop, and manage a multi-tenant Kubernetes platform utilizing Cluster API (CAPI) with bare-metal providers (Metal3/Sidero).
• Execute hard multi-tenancy through vCluster (Loft Labs) or equivalent technologies, ensuring isolated Kubernetes API servers for each tenant.
• Deploy and oversee KubeVirt for VM orchestration within Kubernetes, including configurations for CPU pinning, NUMA awareness, and HugePages.
• Establish GitOps-driven infrastructure using ArgoCD or Flux as the definitive source for all cluster configurations.
• Deploy and manage Policy-as-Code with Kyverno or OPA Gatekeeper for admission control, resource quotas, and security policies.
• Create self-service capabilities using Crossplane or comparable Kubernetes-native infrastructure provisioning tools.
• Operate and enhance Ceph distributed storage clusters (currently 1 PiB raw, 149 OSDs, Quincy 17.2.5).
• Manage Rook-Ceph operator deployments at scale on modern Kubernetes (v1.28+).
• Implement storage tiering: utilizing Ceph for bulk storage, local NVMe for high-IOPS workloads, and LINSTOR/DRBD or TopoLVM for ultra-fast replicated storage.
• Deploy and manage overlay networks for pod networking, micro-segmentation, and WireGuard/IPsec encryption.
• Adhere to SRE practices: define and uphold SLOs with error budgets, and implement proactive capacity management with forecasting over 6-12 months.
• Develop and maintain Terraform/OpenTofu modules for multi-cloud infrastructure provisioning.
• Write Ansible playbooks for the configuration of bare-metal servers and fleet management.
• 5+ years of experience in infrastructure/platform engineering roles, including a minimum of 3 years managing production Kubernetes clusters (not solely deploying apps on K8s, but building and overseeing the platform itself).
• Proven production experience with at least 3 of the following: KubeVirt or similar VM-on-K8s technology, Cluster API (CAPI) for declarative cluster lifecycle management, Cilium or Calico (advanced CNI with eBPF or BGP integration), Rook-Ceph or other Kubernetes storage operators at scale (100+ OSDs), ArgoCD or Flux for GitOps-driven infrastructure management.
• Extensive knowledge of Linux systems: kernel tuning, networking stack (iptables/nftables, routing, bonding, VLAN), filesystem operations, and performance troubleshooting.
• Experience with Ceph distributed storage: cluster operations, OSD lifecycle management, pool management, performance tuning, and troubleshooting degraded states.
• Proficiency in Infrastructure as Code: Terraform/OpenTofu + Ansible at production scale.
• Experience with bare-metal infrastructure: IPMI/iDRAC, PXE boot, RAID configuration, hardware diagnostics, and datacenter operations.
• Understanding of networking fundamentals: BGP, VLAN, IPSec/WireGuard, DNS, and load balancing.
• Strong written and verbal communication skills in English (B2+ minimum) — documentation, postmortems, and cross-team communication are conducted in English.
• Proactive approach: demonstrated history of identifying issues before they escalate into incidents and initiating improvements independently.
• Emphasis on professional development.
• Engaging and challenging projects.
• Fully remote work with flexible working hours, allowing you to manage your schedule and work from any location worldwide.
• Paid 24 days of vacation each year, 10 days of national holidays, and unlimited sick leave.
• Compensation for private medical insurance.
• Reimbursement for co-working and gym/sports expenses.
• Budget allocated for education.
• Opportunity to earn a reward for the most innovative idea that the company can patent.
Attio
TechBiz Global
Get handpicked remote jobs straight to your inbox weekly.