
Site Reliability Engineer – Senior
Posted May 21

Posted May 21
This is a fully remote position, open to applicants in Brazil.
• Ensure the availability, scalability, and performance of applications and services;
• Implement and enhance practices of **observability**, including metrics, logs, and traces;
• Create and maintain **dashboards (dashs)** for monitoring system health indicators;
• Define and manage **alerting systems**, focusing on efficient alerts and reducing noise;
• Engage in identifying and resolving incidents, conducting root cause analysis (RCA);
• Collaborate with development teams for continuous improvement (DevOps);
• Automate operational routines and monitoring processes;
• Support the definition and tracking of SLIs, SLOs, and SLAs;
• Contribute to the culture of reliability and resilience engineering.
• Experience with **SRE/DevOps** practices;
• Strong knowledge of **observability** (monitoring, logging, and tracing);
• Experience in building **dashboards and visualizing operational data**;
• Experience with **alert management (alerting systems)**;
• Familiarity with monitoring and observability tools, such as:
• Elastic Stack (Elasticsearch, Logstash, Kibana);
• Datadog;
• Splunk;
• Dynatrace;
• Knowledge in cloud environments (AWS, Azure, or GCP);
• Experience with automation (Python, Shell Script, or similar);
• Understanding of Linux systems and networking;
• Experience with containers and orchestration (Docker/Kubernetes);
• Experience with APM (Application Performance Monitoring) tools;
• Knowledge in infrastructure as code (Terraform, CloudFormation);
• Experience with CI/CD pipelines;
• Familiarity with Chaos Engineering practices;
• Certifications in cloud or SRE;
• Experience with business-oriented observability culture.
• null
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.