
Cloud Operations Engineer
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in California.
• Platform & Infrastructure: Design, construct, and sustain cloud infrastructure utilizing infrastructure-as-code (Terraform) on GCP.
• Manage and advance our Kubernetes platform, including cluster operations, workload configurations, and service mesh (Istio).
• Develop and enhance internal tools that simplify cloud complexity and elevate the developer experience.
• Collaborate with product engineering teams to comprehend service deployment requirements and provide infrastructure solutions.
• Reliability & Observability: Monitor platform health with Datadog; proactively identify and address performance, availability, and security concerns.
• Participate in on-call rotations and incident response; conduct blameless post-mortems and resolve recurring issues at their root cause.
• Define and monitor service-level indicators and objectives (SLIs/SLOs) for essential platform components.
• Implement and enhance alerting, dashboards, and runbooks that decrease mean time to resolution.
• Security & Compliance: Integrate security best practices into infrastructure workflows (DevSecOps) — not as an afterthought, but as a foundational principle.
• Assist in maintaining cloud security posture, IAM hygiene, and policy guardrails throughout our cloud environment.
• Stay up-to-date with cloud security advancements and proactively highlight risks to the team.
• Execute and uphold our automated disaster recovery protocols.
• Collaboration & Growth: Work closely with product engineering teams to understand their requirements and eliminate infrastructure friction.
• Clearly document systems, processes, and architectural decisions to ensure knowledge is shared, not isolated.
• Suggest enhancements to tools, architecture, and processes — and assist in driving them to completion.
• Stay informed about the evolving cloud-native ecosystem and share relevant insights with the team.
• Bachelor's degree in Computer Science or a related field.
• Over 5 years of experience in cloud infrastructure, platform engineering, or a related area.
• Practical experience with Kubernetes in production environments (cluster management, workloads, networking).
• Proficient with infrastructure-as-code tools, especially Terraform.
• Experience with at least one major cloud provider (GCP, AWS, or Azure).
• Strong scripting and automation capabilities in Python, Bash, or a similar language.
• Experience with contemporary observability platforms (Datadog, Grafana, or equivalent).
• Solid understanding of Linux systems administration.
• Familiarity with CI/CD concepts and tools (GitHub Actions, ArgoCD, Jenkins, or similar).
• Exceptional communication skills — able to write clearly, ask insightful questions, and explain complex systems in an accessible manner.
• AI-Augmented Development: Capable of demonstrating the use of AI-enabled development tools (e.g., Claude Code, Cursor) to streamline coding, debugging, and infrastructure-as-code authoring.
• Health insurance.
• 401(k) matching.
• Flexible work hours.
• Paid time off.
• Professional development opportunities.
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.