
Staff Site Reliability Engineer
Posted May 6

Posted May 6
• Collaborate with engineering and product teams to design and construct the necessary infrastructure and services to deliver new features to customers in a cloud-native and event-driven manner.
• Enhance and advance our Infrastructure as Code (IaC) using Terraform and Configuration Management (CM) with Helm for improved scaling and self-service functionality for engineering teams.
• Identify and eliminate bottlenecks in production systems across AWS services and our Kubernetes platform.
• Maintain a customer-facing uptime of 99.99%.
• Continuously enhance our platform's monitoring and alerting capabilities, allowing us to adopt a proactive rather than reactive approach.
• A minimum of 8 years of professional experience in SRE/DevOps, with a proven track record of working on high-volume production systems.
• Proven expertise in systems architecture, effectively solving complex technical challenges and implementing solutions across the organization.
• In-depth knowledge of AWS services and technologies, including ALB/ELB, IAM permissions, DynamoDB, SNS, EKS/Fargate, etc.
• Experience with Infrastructure as Code and configuration management tools, particularly Terraform and Helm charts, for designing and provisioning new services.
• Proficiency in Python, Bash, or other scripting languages; familiarity with Ruby or Golang is a plus.
• Strong sense of ownership and motivation to collaborate with others to bring improvements to production.
• Health insurance
• Pharmacy benefits
• Optical care benefits
• Dental care benefits
• Paid time off
• Sick time off
• Short term disability coverage
• Long term disability coverage
• Life insurance
• 401k contribution
Arctiq
Arctiq
Software Mind
Mediastream
Get handpicked remote jobs straight to your inbox weekly.