
Cloud Operations Engineer
Posted May 7

Posted May 7
This is a fully remote position, open to applicants in California.
• Design, build, and maintain cloud infrastructure utilizing infrastructure-as-code (Terraform) on GCP.
• Manage and enhance our Kubernetes platform, focusing on cluster operations, workload configuration, and service mesh (Istio).
• Develop and refine internal tools that simplify cloud complexities and enhance the developer experience.
• Collaborate with product engineering teams to comprehend service deployment requirements and deliver infrastructure solutions.
• Monitor platform health using Datadog; proactively identify and resolve performance, availability, and security issues.
• Participate in on-call rotations and incident response; conduct blameless post-mortems to eradicate recurring issues at their root cause.
• Define and monitor service-level indicators and objectives (SLIs/SLOs) for critical platform components.
• Implement and enhance alerting, dashboards, and runbooks that minimize mean time to resolution.
• Integrate security best practices into infrastructure workflows (DevSecOps) as a design principle, rather than an afterthought.
• Assist in maintaining cloud security posture, IAM hygiene, and policy guardrails throughout our cloud environment.
• Stay updated on cloud security developments and actively communicate risks to the team.
• Execute and uphold our automated disaster recovery processes.
• Collaborate effectively with product engineering teams to understand their requirements and alleviate infrastructure friction.
• Clearly document systems, processes, and architectural decisions to ensure knowledge is shared rather than siloed.
• Propose enhancements to tools, architecture, and processes, and assist in driving these improvements to completion.
• Stay current with the evolving cloud-native ecosystem and share relevant knowledge with the team.
• Bachelor’s degree in Computer Science or a related field.
• Over 5 years of experience in cloud infrastructure, platform engineering, or a similar field.
• Hands-on experience with Kubernetes in production settings, including cluster management, workloads, and networking.
• Proficient with infrastructure-as-code tools, especially Terraform.
• Experience with at least one major cloud provider (GCP, AWS, or Azure).
• Strong scripting and automation skills in Python, Bash, or a similar language.
• Experience with contemporary observability platforms (Datadog, Grafana, or similar).
• Solid understanding of Linux systems administration.
• Familiarity with CI/CD concepts and tools (GitHub Actions, ArgoCD, Jenkins, or similar).
• Exceptional communication skills — able to write clearly, ask insightful questions, and explain complex systems in an accessible manner.
• AI-Augmented Development: Capability to demonstrate the use of AI-enabled development tools (e.g., Claude Code, Cursor) to streamline coding, debugging, and infrastructure-as-code authoring.
• Health insurance.
• 401(k) matching.
• Flexible work hours.
• Paid time off.
• Professional development opportunities.
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.