Remotery

Staff SRE

atLytx, Inc.US flagVirginiaFull-timeDevOps & Site Reliability Engineer (SRE)Lead$183.5k – $232.5k/year

Posted 1 day ago

📋 Description

• Develop tools and frameworks for system monitoring to ensure maximum uptime in production environments.

• Provide mentorship to the SRE team regarding best practices.

• Foster a culture of innovation.

• Take the lead in improving our 24/7 on-call and incident management processes.

• Create and sustain Run-books.

• Participate in the design and documentation of cloud services and Standard Operating Procedures (SOPs).

• Impact service design by collaborating with Architects, DBAs, Developers, DevOps, and Data engineers to integrate reliability, scalability, and cost-efficiency early in the development lifecycle.

• Facilitate blameless post-mortems.

• Assume responsibility for publishing Root Cause Analysis (RCA) documents for both internal and external stakeholders.

• Collaborate with Service Owners to define Service Level Objectives (SLOs) and establish Service Level Indicators (SLIs) to ensure compliance with Service Level Agreements (SLAs).

• Investigate and assess new cloud technologies and vendor solutions to improve product stability and manageability.

• Minimize Operational Toil and uphold a high level of automation by adopting Infrastructure as Code (IaC) principles and GitOps methodologies.

• Obtain and maintain a comprehensive understanding of Lytx production services to ensure swift resolution of production incidents.


⛳️ Requirements

• Over 8 years of experience as a Site Reliability Engineer (SRE) in an AWS environment within medium to large-scale organizations.

• More than 6 years of practical experience in implementing and managing Observability tools such as Prometheus, New Relic, Grafana, etc.

• Strong proficiency in programming, preferably in Python, Groovy, and Bash.

• Practical experience in managing both SQL and NoSQL database technologies.

• Over 5 years of experience in building infrastructure deployment pipelines using Git, Terraform, Helm, Jenkins/JenkinsX/ArgoCD, etc.

• Proficient in designing production environments on the AWS cloud utilizing various AWS services (VPCs, EKS, IAM, AMI, EC2, CloudWatch, CloudTrail, Control Tower, Guard Duty, MSK, S3, Glacier, Gateways, Direct Connects, Route 53, RDS, ALBs, Autoscaling, etc.).

• Extensive experience with Linux systems and a variety of protocols and technologies (HTTP, REST, TCP/IP, SSL, DNS, SMTP, SSH, NTP, Load Balancing, SQL/NoSQL, Message Brokers, Nginx, Vault, ELK, etc.).

• Practical experience with Kubernetes and various container and cloud-native technologies.

• Significant experience in participating in, implementing, and managing a 24/7 on-call rotation for the SRE team, creating run books, developing support procedures, and proactively monitoring systems across different geographical areas.

• Capability to perform well under pressure in a technically demanding environment.


🏝️ Benefits

• Medical, dental, and vision insurance.

• Health Savings Account.

• Flexible Spending Accounts.

• Telehealth services.

• 401(k) plan and 401(k) matching.

• Life and Accidental Death & Dismemberment (AD&D) insurance.

• Short-Term and Long-Term Disability coverage.

• Flexible Time Off (FTO) or Paid Time Off (PTO).

• Employee Well-Being program.

• 11 paid holidays plus one inclusive holiday per year.

• Volunteer Time Off.

• Employee Referral program.

• Education Reimbursement Program.

• Employee Recognition and Appreciation program.

• Additional perks and optional benefit programs.

People also viewed

Launch Potato56 min ago

Lead DevOps/SRE Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Xtremepush56 min ago

Senior DevOps Engineer, AWS

LT flagLithuania OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
BI2run56 min ago

BI DevOps Engineer – m/w/d

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)€50k – €70k/year
ApplyView job
S + S Regeltechnik GmbH56 min ago

Team Leader – DevOps

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
NVIDIA56 min ago

Senior Network Reliability Engineer – DGX Cloud

US flagCalifornia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$136k – $264.5k/year
ApplyView job
Newfold Digital56 min ago

Principal Dev Ops Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers