
Site Reliability Engineer II
Posted Jun 21

Posted Jun 21
This is a fully remote position, open to applicants in California.
• Ensuring the reliability, availability, and performance of production infrastructure and platform services.
• Operating and scaling Kubernetes platforms, including governance and support for multi-tenant workloads.
• Managing GitOps-based deployment workflows utilizing ArgoCD and Helm.
• Supporting infrastructure provisioning and change management through Terraform/Terragrunt.
• Building and maintaining CI/CD automation and deployment workflows using GitHub Actions.
• Participating in incident response, root cause analysis, and initiatives for post-incident improvement.
• Minimizing operational toil through scripting, tooling, and process automation.
• Advancing observability practices across logs, metrics, traces, dashboards, and alerts.
• Supporting secure secrets integration, IAM-aware operations, and platform guardrails.
• Collaborating closely with application, security, and platform teams to enhance reliability and delivery outcomes.
• Over 4 years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Cloud Infrastructure.
• Extensive hands-on experience operating AWS in production settings.
• Strong expertise in Kubernetes, including cluster operations, troubleshooting, workload reliability, and platform management.
• Experience with Kubernetes multi-tenancy, encompassing namespaces, RBAC, quotas, policies, and tenant isolation strategies.
• Proficient in implementing and managing ArgoCD within a GitOps delivery framework.
• Solid hands-on experience with Helm.
• Familiarity with Terraform/Terragrunt for infrastructure provisioning and environment management.
• Strong scripting and automation capabilities using Bash and/or Python.
• Experience in building, maintaining, or supporting CI/CD pipelines, preferably using GitHub Actions.
• Excellent troubleshooting skills across Linux, containers, IAM, networking, and distributed systems.
• Proficient in monitoring, alerting, and observability within production environments.
• Demonstrated ownership mindset with experience in managing incidents and resolving production issues.
• Strong collaboration and communication abilities, effectively working across engineering, security, and platform teams.
• Bachelor’s degree in computer science, engineering, a related field, or equivalent professional experience.
• Proven ability to leverage AI to enhance speed and quality in daily workflows for relevant outputs.
• Strong track record of critically evaluating and verifying AI-assisted work (e.g., testing, source-checking, data validation, peer review).
• High integrity and ownership: protecting sensitive data, avoiding over-reliance on AI, and remaining accountable for final decisions and deliverables.
• Information regarding the culture at Pinterest and benefits available for this position can be found here.
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.