This is a fully remote position, open to applicants in California.

📋 Description

• Ensuring the reliability, availability, and performance of production infrastructure and platform services.

• Operating and scaling Kubernetes platforms, including governance and support for multi-tenant workloads.

• Managing GitOps-based deployment workflows utilizing ArgoCD and Helm.

• Supporting infrastructure provisioning and change management through Terraform/Terragrunt.

• Building and maintaining CI/CD automation and deployment workflows using GitHub Actions.

• Participating in incident response, root cause analysis, and initiatives for post-incident improvement.

• Minimizing operational toil through scripting, tooling, and process automation.

• Advancing observability practices across logs, metrics, traces, dashboards, and alerts.

• Supporting secure secrets integration, IAM-aware operations, and platform guardrails.

• Collaborating closely with application, security, and platform teams to enhance reliability and delivery outcomes.

⛳️ Requirements

• Over 4 years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Cloud Infrastructure.

• Extensive hands-on experience operating AWS in production settings.

• Strong expertise in Kubernetes, including cluster operations, troubleshooting, workload reliability, and platform management.

• Experience with Kubernetes multi-tenancy, encompassing namespaces, RBAC, quotas, policies, and tenant isolation strategies.

• Proficient in implementing and managing ArgoCD within a GitOps delivery framework.

• Solid hands-on experience with Helm.

• Familiarity with Terraform/Terragrunt for infrastructure provisioning and environment management.

• Strong scripting and automation capabilities using Bash and/or Python.

• Experience in building, maintaining, or supporting CI/CD pipelines, preferably using GitHub Actions.

• Excellent troubleshooting skills across Linux, containers, IAM, networking, and distributed systems.

• Proficient in monitoring, alerting, and observability within production environments.

• Demonstrated ownership mindset with experience in managing incidents and resolving production issues.

• Strong collaboration and communication abilities, effectively working across engineering, security, and platform teams.

• Bachelor’s degree in computer science, engineering, a related field, or equivalent professional experience.

• Proven ability to leverage AI to enhance speed and quality in daily workflows for relevant outputs.

• Strong track record of critically evaluating and verifying AI-assisted work (e.g., testing, source-checking, data validation, peer review).

• High integrity and ownership: protecting sensitive data, avoiding over-reliance on AI, and remaining accountable for final decisions and deliverables.

🏝️ Benefits

• Information regarding the culture at Pinterest and benefits available for this position can be found here.

Site Reliability Engineer II

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Cloud Engineer – DevOps

DevSecOps/DevOps Engineer

Deployment Engineer

Senior Cloud - Kubernetes SRE

DevOps Engineer

DevSecOps Engineer

Never miss a great job!