
DevOps Engineer
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in United States.
• Take ownership of and enhance the infrastructure and operational systems that underpin Clever's engineering teams.
• Create more secure deployment pathways featuring automated health checks, documented rollback procedures, and clear release visibility for high-risk services where the architecture allows.
• Enhance production reliability across infrastructure, data stores, queues, background jobs, logs, alerts, and supporting services.
• Lead or assist in infrastructure migrations, production cutovers, and environment modifications across Heroku, AWS, ECS, RDS, networking, and DNS.
• Improve observability and incident response through enhanced alerts, dashboards, escalation paths, post-incident reviews, runbooks, and operational drills.
• Establish practical reliability targets, service ownership expectations, and production-readiness criteria for critical systems, internal tools, integrations, and AI-driven workflows.
• Collaborate with engineers on infrastructure-sensitive application tasks, including scaling, error management, deployment risk, data access, rollback planning, and minor backend or internal-tooling modifications.
• Enhance SOC2 and security readiness by refining infrastructure controls, access patterns, vulnerability management, evidence gathering, and repeatable operational processes.
• Optimize local development and staging environments to facilitate smoother onboarding, testing, and validation of changes for engineers.
• Over 5 years of experience in DevOps, SRE, infrastructure engineering, platform engineering, or backend engineering.
• Significant production experience with cloud infrastructure, particularly AWS and services like ECS, RDS, networking, IAM, DNS, and related operational methodologies.
• Experience in operating or migrating production systems on Heroku, AWS, or comparable platforms.
• In-depth knowledge of CI/CD, deployment automation, rollback strategies, environment management, and release safety.
• Strong experience in observability and incident response, including logs, metrics, traces, dashboards, alerting, escalation paths, runbooks, and post-incident analysis.
• Familiarity with infrastructure-as-code, repeatable infrastructure templates, scripting, and automation.
• Proficient in reading backend application code and making well-scoped changes to backend or internal tools when infrastructure and application behavior intersect.
• Strong judgment in security and access management, encompassing secrets management, environment controls, and least-privilege practices.
• A strong sense of ownership with a proactive approach, clear execution, and practical systems thinking.
• Excellent written communication and business acumen, particularly for runbooks, technical decisions, operational standards, incident follow-up, and linking infrastructure efforts to company impact.
• Medical, dental, vision, and life insurance.
• 18 days of paid time off (increasing with tenure) along with 10 paid holidays.
• Annual budget allocated for learning and career advancement.
• Paid sabbaticals to commemorate significant milestones.
• Exclusive access to Clever homeownership benefits.
• Support for your remote working environment.
• Retirement plan managed through Guideline.
• 6 to 12 weeks of paid parental leave.
• Complimentary counseling sessions and optional weekly meditation.
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.