
SRE Specialist I
Posted May 22

Posted May 22
This is a fully remote position, open to applicants in Brazil.
• Deliver operational support while enhancing the reliability, performance, and scalability of applications within AWS environments.
• Establish, implement, and track SLIs, SLOs, and SLAs to guarantee the high availability of services.
• Automate the provisioning of infrastructure through Infrastructure as Code (IaC).
• Create and sustain pipelines and automation processes that minimize manual intervention and bolster system resilience.
• Actively engage in monitoring, observability, incident response, and post-mortem evaluations to foster continuous improvement.
• Work collaboratively with development, architecture, and security teams to advocate for reliability best practices.
• Examine metrics, logs, and traces to pinpoint bottlenecks and identify opportunities for optimization.
• Take part in on-call rotations and critical incident response activities.
• Proven experience as an SRE, DevOps, or infrastructure engineer in critical mission environments.
• In-depth knowledge of AWS, covering services such as EC2, ECS/EKS, RDS, S3, VPC, CloudWatch, and more.
• Practical experience with Terraform for the provisioning and management of infrastructure.
• Familiarity with Ansible for configuration automation and orchestration.
• Extensive experience in observability and monitoring, utilizing tools like Dynatrace and Datadog.
• Understanding of high availability and fault-tolerance principles.
• Experience working with distributed and scalable architectures.
• Monitoring expertise based on metrics, logs, and traces.
• Knowledge of Linux and advanced troubleshooting skills in distributed systems.
• Comprehensive health benefits.
• Flexible work arrangements.
• Opportunities for professional development.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.