
Senior Site Reliability Engineer
Posted 1 hour ago

Posted 1 hour ago
• Implement the Observability Ladder and establish SLAs, SLIs, and SLOs.
• Create deployment tools that enable teams to automate rollbacks when error budgets are exhausted.
• Foster a blameless post-mortem culture centered on actionable insights and measurable metrics.
• Continuously enhance alerting and on-call frameworks to minimize alert fatigue.
• Develop systems for verification both before and after deployments.
• Lead the initiative to manage the reliability suite through Infrastructure as Code (IaC) utilizing Terraform.
• Bachelor's degree in Computer Science, Information Technology, or a related discipline.
• Over 5 years of experience in Software Engineering, Site Reliability Engineering (SRE), DevOps, or Platform Engineering.
• Strong coding skills: Proficient in Python (or a similar programming language).
• Practical experience with AWS and a robust understanding of Infrastructure as Code (Terraform or CloudFormation).
• Proven experience with monitoring tools such as DataDog, Prometheus, or the ELK stack.
• Solid understanding of SRE principles, including Golden Signals and error budget calculations.
• Demonstrated ability to define and enforce reliability standards across multiple teams.
• Flexibility and the option to work remotely.
• A work-life balance that ensures you are not expected to work on weekends or outside of regular hours.
• A progressive remote company that offers virtual social platforms for employee engagement.
• A monthly allowance for working from home.
• A MacBook or Windows laptop to enable you to perform at your best.
• Support for your professional development, along with recognition of your achievements and career advancement.
Auvaria
Grupo Salta Educação
Akamai Technologies
Parlay Games Inc.
Get handpicked remote jobs straight to your inbox weekly.