Remotery

Platform Infrastructure Engineer – SRE Core

Posted 1 day ago

📋 Description

• Design, deploy, and maintain VM and Kubernetes infrastructure on GCP and AWS across numerous clusters that span development, staging, and production environments in various regions.

• Collaborate with colleagues within your team and across departments to ensure that the tasks you are focusing on address the problems we need to resolve.

• Create and sustain Infrastructure as Code (IaC) utilizing Terraform modules, managing resources through Spacelift or similar Terraform Automation and Collaboration Software (TACOS). Provision cloud infrastructure, which includes networking, compute, storage, and security components primarily on GCP, with additional support for AWS.

• Implement and oversee workflows with advanced multi-layer configuration management.

• Develop and maintain extensive observability solutions using Grafana Cloud, Prometheus/Mimir, and OTel collectors. Design Grafana dashboards, set up alerting rules, and ensure visibility across all platform components.

• Administer certificate lifecycle, DNS automation, ingress controllers, and service mesh networking with Cilium.

• Collaborate with Engineering, Product, Compliance, and Security teams to develop resilient, scalable systems. Provide consultation on capacity planning, disaster recovery, and architectural choices for cloud-native applications.

• Identify and reduce toil through automation. Create scripts, develop tools, and construct CI/CD pipelines to enhance operational efficiency and minimize manual tasks.

• Engage in a 24x7 on-call rotation as part of a globally distributed team, responding to incidents and facilitating post-incident reviews.


⛳️ Requirements

• Bachelor's degree in Computer Science, a related technical field of study, or equivalent practical experience.

• Proficiency in popular programming and scripting languages, with a strong focus on Python, Bash, and Go.

• Understanding of network topologies, communication protocols (e.g., TCP/IP, HTTP/S, UDP, TLS), and enterprise-grade connectivity solutions.

• Expertise in Kubernetes, including cluster administration, RBAC, networking, workload management, and troubleshooting within production environments.

• Demonstrated experience with Terraform for infrastructure provisioning and management.

• Familiarity with Google Cloud Platform services, such as GKE, VPC networking, Cloud DNS, Artifact Registry, Secret Manager, IAM, Gemini Code Assist, and Workload Identity.

• Experience with GitOps methodologies and tools.


🏝️ Benefits

• Collaborative, inclusive, and enjoyable culture

• Opportunities to take initiative

• Support for new ideas

• Open communication

People also viewed

Launch Potato59 min ago

Lead DevOps/SRE Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Xtremepush59 min ago

Senior DevOps Engineer, AWS

LT flagLithuania OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
BI2run59 min ago

BI DevOps Engineer – m/w/d

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)€50k – €70k/year
ApplyView job
S + S Regeltechnik GmbH59 min ago

Team Leader – DevOps

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
NVIDIA59 min ago

Senior Network Reliability Engineer – DGX Cloud

US flagCalifornia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$136k – $264.5k/year
ApplyView job
Newfold Digital59 min ago

Principal Dev Ops Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers