
Senior Site Reliability Engineer
Posted 7 hours ago

Posted 7 hours ago
This is a fully remote position, open to applicants in United States.
• Oversee and sustain cloud infrastructure on Azure, which includes Azure Kubernetes Service (AKS) clusters and associated resources.
• Develop, enhance, and uphold CI/CD pipelines utilizing GitHub Actions to ensure dependable and repeatable deployments.
• Take ownership of and improve our Grafana implementation; design dashboards, configure alerts, and assist with incident management workflows.
• Monitor system performance, manage incidents, and conduct root cause analysis to avert future occurrences.
• Work alongside development teams to establish and monitor SLIs, SLOs, and error budgets that align with business objectives.
• Contribute to infrastructure-as-code methodologies using Pulumi.
• Identify and mitigate reliability risks through capacity planning, performance optimization, and proactive system enhancements.
• Participate in an on-call rotation to support production systems and address incidents.
• Document runbooks, operational processes, and architectural decisions to facilitate team knowledge sharing.
• 5+ years of experience in a Site Reliability Engineering, DevOps, or Infrastructure position.
• 3+ years of experience with infrastructure as code.
• 2+ years of experience in designing CI/CD pipelines and cloud-based infrastructure.
• Strong practical experience with Azure Cloud services and resource management.
• Proficiency in Kubernetes and AKS administration, including deployments, networking, and troubleshooting.
• Experience with GitHub Actions for the development and maintenance of CI/CD pipelines.
• 3+ years of experience with Grafana or similar tools, including dashboard creation, alerting configuration, and incident management.
• Hands-on experience with Prometheus, Loki, or other observability tools within the Grafana ecosystem.
• Proficiency in at least one scripting or programming language, such as Python or Bash.
• Understanding of networking fundamentals, DNS, load balancing, and concepts related to container orchestration.
• Strong analytical and communication abilities; capable of diagnosing complex system issues and articulating findings clearly.
• Proven ability to collaborate across teams and foster a culture of reliability.
• Experience in an agile environment with contemporary DevOps practices.
• 100% remote work opportunity.
• Health insurance through Aetna, with 100% of premiums covered.
• Dental and vision insurance through Guardian, with 100% of premiums covered.
• Basic life insurance with 100% of premiums covered.
• Access to a flexible spending account (FSA) or health savings account (HSA) for those with HSA eligible plans.
• 401K plan with a 4% match and immediate vesting.
• Must be at least 21 years old to participate.
• Flexible PTO policy that provides employees with up to 4 weeks of PTO in their first 12 months; thereafter, PTO usage aligns with company standards and typically does not exceed 5 weeks per calendar year.
• 12 company-paid holidays each year.
• Annual stipend for continuing education.
Urrly
Weiler Abrasives Group
Abbott
Segoso
Get handpicked remote jobs straight to your inbox weekly.