
Site Reliability and DevOps Engineering Lead
Posted 11 hours ago

Posted 11 hours ago
This is a fully remote position, open to applicants in California.
• Lead, mentor, and develop Platform / DevOps engineers.
• Establish a high-performing Platform team.
• Foster accountability for platform reliability and delivery outcomes.
• Guide vendors in delivering capabilities in production.
• Ensure that platform capabilities facilitate product delivery and eliminate bottlenecks.
• Define and uphold platform engineering standards and DevOps practices across all teams and vendors.
• Oversee capacity planning, performance optimization, and cost efficiency.
• Establish operational standards, runbooks, and reliability practices.
• Responsible for platform reliability outcomes at both enterprise and product levels.
• Serve as the technical authority in platform, reliability, and delivery.
• Formulate platform strategy and roadmap.
• Govern delivery across internal teams and external vendors.
• Own SLIs, SLOs, and error budgets.
• Lead resilience engineering, observability, and failure design.
• Promote proactive risk reduction and continuous improvement.
• Manage incident management frameworks and drive continuous improvement.
• Oversee end-to-end pipeline architecture and release automation.
• Standardize, secure, and fully automate pipelines.
• Propel continuous integration, delivery, and validation practices.
• Lead Sev1 response, escalation, and recovery efforts.
• Own RCA and advocate for systemic solutions (not just point fixes).
• Introduce AI-driven pipeline optimization and quality gates.
• Integrate AI into monitoring, risk prediction, and CI/CD optimization.
• Drive automation to minimize operational toil and enhance decision-making.
• Bachelor’s degree in computer science, Engineering, or a related field.
• 6-10 years of hands-on experience in software operations, DevOps, and Site Reliability Engineering, specifically in managing large-scale, mission-critical systems.
• Clear and confident communication skills with the ability to lead teams and collaborate effectively across engineering, product, and architecture teams.
• Proven track record of ensuring high availability and performance in production environments, with expertise in fault-tolerant, distributed system design.
• Excellent understanding of modern software delivery pipelines and DevOps practices, including CI/CD, configuration management, and version control (Git).
• Exceptional problem-solving skills, with experience diagnosing complex system issues under pressure and driving them to resolution.
• Strong proficiency in at least one programming or scripting language (e.g., Python, Bash, or Java) for automation and tool integration.
• Self-motivated and proactive, with a passion for automating manual processes and continuously improving systems to enhance reliability and team productivity.
• Remote-first / work-from-home culture.
• Flexible vacation policy to help you rest, recharge, and connect with loved ones.
• Paid leave benefits.
• Health, dental, and vision insurance.
• 401k retirement savings plan.
• Infertility benefits.
• Tuition reimbursement, life insurance, EAP – and more!
Cision France
Navigate Power
Get handpicked remote jobs straight to your inbox weekly.