
Senior Manager, DevOps
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in California.
• Establish and implement the long-term strategic vision for Infrastructure as Code (IaC), the evolution of CI/CD, and cloud-native architecture to accommodate TrueML's scaling requirements.
• Spearhead the design and deployment of self-service internal platforms aimed at alleviating developer cognitive load, enabling feature teams to deploy and manage services with minimal friction and enhanced velocity.
• Serve as the primary decision-maker for cloud expenditure (AWS); spearhead cost-optimization initiatives and lead negotiations for the DevOps toolstack and third-party vendors.
• Ensure that the infrastructure architecture adheres to stringent High Availability (HA) standards and robust Disaster Recovery (DR) protocols, ensuring system integrity across various regions.
• Supervise the implementation and advancement of extensive monitoring, logging, and distributed tracing systems, utilizing AIOps to transition from reactive to predictive system maintenance.
• Advocate for security by design by incorporating automated vulnerability scanning, secret management, and compliance checks directly into the automated build pipelines.
• Act as the primary escalation point for significant production outages, facilitating blameless post-mortem reviews focused on systemic enhancements rather than personal mistakes.
• Maintain up-to-date technical expertise in container orchestration (Kubernetes), serverless patterns, and contemporary automation frameworks to offer valuable mentorship and architectural guidance to senior engineering personnel.
• Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
• More than 10 years of experience in DevOps, Site Reliability Engineering (SRE), or Software Engineering, with over 5 years in a managerial role overseeing engineers.
• Expert-level proficiency with AWS and experience managing multi-region, high-availability deployments.
• Advanced knowledge of Kubernetes (K8s) and Docker, including cluster management, networking, and scaling within a production environment.
• Proficiency in Terraform to ensure consistency and automation across all infrastructure layers; experience with Atlantis is a plus.
• Extensive experience in designing and maintaining complex pipelines (GitHub Actions, GitLab CI, or Jenkins) and expertise in scripting languages such as Python, Go, or Bash.
• Practical experience with modern monitoring, observability, and tracing stacks (Datadog, Observe), along with a solid understanding of SRE principles (SLIs/SLOs/Error Budgets).
• Experience serving as an Incident Commander during high-severity outages and promoting a "blameless" post-mortem culture.
• Proven ability to influence executive leadership and collaborate effectively across Product, Engineering, and Security teams.
• Experience in integrating AI-assisted productivity tools (Cline, GitHub Copilot) into the engineering workflow to expedite delivery.
• Competitive salary and performance-based bonuses.
• Comprehensive health, dental, and vision insurance.
• Flexible work environment with options for remote work.
• Opportunities for professional development and continuous learning.
• Generous vacation and paid time off policies.
Cision France
Navigate Power
Get handpicked remote jobs straight to your inbox weekly.