
Lead Engineer, DevOps – SRE
Posted 4 hours ago

Posted 4 hours ago
• Take charge of and enhance Launch Potato's cloud infrastructure, CI/CD platform, and compliance framework.
• Establish the Site Reliability Engineering (SRE) function from the ground up, enabling product teams to accelerate shipping without sacrificing reliability, security, or budget control.
• Create the SRE practice from the ground up, including on-call rotation, PagerDuty setup, SLA/SLO definitions for essential infrastructure services, a runbook library, and observability dashboards that connect site performance to business metrics.
• Complete the AWS multi-account migration by transitioning production workloads to an isolated account with zero unplanned downtime.
• Produce a SOC 2 Type I audit-ready infrastructure evidence package, overseeing the technical controls implementation from start to finish.
• Version and publish the Terraform module library (30+ modules) to a private registry, eliminating ad hoc git usage by product teams.
• Implement automated deployment rollback for ECS and Lambda, ensuring that production is contingent upon the successful passage of integration tests.
• Establish monthly cost reporting for leadership, including budget anomaly detection, savings plan recommendations, and expenditure by service/team/environment.
• A minimum of 5 years of experience in production AWS infrastructure with substantial expertise in Terraform.
• Proven experience in building an SRE function from the ground up, with complete ownership of the process.
• Familiarity with a multi-site organization where PaaS or microservices are essential.
• Previous ownership of CI/CD pipelines in one or more roles.
• Experience with PagerDuty and establishing an on-call rotation.
• Profit-sharing bonus
• Competitive benefits
PandaDoc
PandaDoc
PandaDoc
PandaDoc
Get handpicked remote jobs straight to your inbox weekly.