
Senior Site Reliability Engineer
Posted 21 hours ago

Posted 21 hours ago
This is a fully remote position, open to applicants in Mexico.
• Take ownership of comprehensive monitoring and alerting for essential platform components utilizing Datadog, Application Insights, and Grafana.
• Establish and monitor SLOs/SLIs while fostering proactive incident prevention.
• Manage and resolve daily DevOps tickets through scripting and automation (PowerShell, Bash, Azure CLI) to minimize operational toil and enhance engineering velocity.
• Oversee and enhance cloud infrastructure utilizing Terraform.
• Create reusable modules and uphold infrastructure best practices across testing and production environments.
• Spearhead incident response efforts, conduct blameless post-mortems, and implement subsequent actions to avert future occurrences.
• Collaborate with engineering teams to optimize CI/CD pipelines in Azure DevOps, mitigating deployment risks and expediting delivery.
• Assist in designing SOC-2 compliant infrastructure, contribute to security hardening, vulnerability management, and adherence to regulatory obligations.
• Stay abreast of industry trends, particularly regarding AI-assisted operations, and assess new tools to consistently enhance infrastructure reliability and performance.
• Over 5 years of experience in a DevOps or SRE position, supporting mission-critical, highly available systems.
• Robust expertise in Terraform (IaC) for provisioning and managing cloud infrastructure at scale.
• Practical experience with Azure Cloud, encompassing computing, networking, storage, and managed services.
• Profound knowledge of monitoring, alerting, and observability tools such as Datadog, Azure Insights, Azure Application Insights, Grafana, or similar.
• Proficient in scripting languages like PowerShell, Python, or Bash for automation and incident resolution.
• Familiarity with Azure DevOps for CI/CD automation, involving Repos, Pipelines, and Releases.
• Readiness to participate in early-morning on-call duties and respond to production incidents as they occur.
• Excellent communication and collaboration abilities, thriving in a small, fast-paced team environment.
• Competitive salary
• Remote work options
Investigo
Software Mind
Cherokee Federal
Avaya
Get handpicked remote jobs straight to your inbox weekly.