
Site Reliability Engineer – Level 3
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in Brazil.
• Spearhead the design and implementation of scalable and dependable systems.
• Create sophisticated automation strategies aimed at minimizing manual tasks.
• Perform thorough postmortems and establish long-term solutions.
• Guide junior engineers and advocate for best practices across various teams.
• Enhance incident response protocols and drive reductions in MTTR (Mean Time to Recovery).
• Streamline cloud infrastructure costs and optimize resource usage.
• Shape the SRE culture and promote process enhancements.
• Over 5 years of experience in SRE, DevOps, or software engineering positions.
• Proficient programming abilities in Python, Go, or Java.
• Experience with scalable and distributed systems.
• Familiarity with monitoring and logging tools (Grafana, ELK stack, Splunk, etc.).
• Knowledge of containerization and orchestration technologies (Docker, Kubernetes).
• Advanced experience in cloud automation (AWS, Azure, GCP).
• Understanding of CI/CD pipelines and version control systems.
• Knowledge of networking, databases, and storage architectures.
• Familiarity with incident management frameworks (e.g., xMatters, PagerDuty, Opsgenie).
• Experience in ensuring production reliability for real-time systems, score computation services, or policy engines.
• Providing a comprehensive and competitive benefits package.
• Health insurance coverage.
• Opportunities for professional development.
Investigo
Software Mind
Cherokee Federal
Avaya
Get handpicked remote jobs straight to your inbox weekly.