Remotery

Senior Site Reliability Engineer II – Infrastructure, AI Native

Posted 5 days ago

This is a fully remote position, open to applicants in Canada.

📋 Description

• Enhancing and managing our infrastructure and services with AI (Claude Code) as an integral partner in your daily development processes.

• Providing clear guidance on technical direction and strategy, while documenting these insights for team alignment.

• Mentoring and guiding fellow engineers within the team.

• Taking ownership of and resolving intricate infrastructure challenges — including Kubernetes scheduling nuances, networking issues, cross-service cascading failures, and AWS platform concerns escalated by other engineers.

• Engaging in a shared on-call schedule (approximately one week every six to eight weeks).

• Estimating timelines and breaking down tasks into manageable 1-3 day segments.

• Promoting cloud cost efficiency by pinpointing over-provisioned resources, optimizing EC2 and container workloads, and developing tools to identify cost anomalies before they escalate.


⛳️ Requirements

• Bachelor’s degree in Computer Science, Engineering, a related field, or equivalent practical experience.

• Extensive experience (5+ years) managing medium to large-scale deployments on AWS (~5000 instances, 50+ accounts), or a comparable environment.

• Over 3 years of programming experience in Java, Python, or other formal programming languages.

• Significant Kubernetes experience (3+ years) in deploying and managing at scale (hundreds of Deployments, over 10k containers, 20k+ Cores).

• Proficient understanding of container orchestration and microservices.

• Familiarity with service discovery/service mesh technologies.

• Strong Linux administration skills, along with shell/bash scripting expertise.

• Advanced experience with Infrastructure as Code tools: Terraform, CloudFormation; and configuration management/provisioning tools: Ansible, Chef, etc.

• Solid experience in Build/Automation/CI/CD practices.

• In-depth knowledge and experience with networking and load-balancer technologies.

• Familiarity with existing open-source projects like Consul, Docker, ArgoCD, Nexus, Jenkins.

• Experience with large-scale Kafka implementations.

• Database knowledge is an advantage.

• Exceptional troubleshooting abilities, proficiency with monitoring tools, and meticulous attention to detail.

• Outstanding interpersonal skills and a highly collaborative working approach.

• Practical experience with AI coding tools (Claude Code, Cursor, or similar) utilized for infrastructure scripting, incident response automation, or tooling development.


🏝️ Benefits

• Competitive salary and comprehensive benefits package.

• Medical, dental, vision, life, and disability insurance options.

• RRSP plan featuring a DPSP company matching program.

• Employee Assistance Program (EAP) focused on mental well-being.

• Flexible paid time off along with several company-wide holidays throughout the year.

• Week-long synchronized company shutdowns during Winter and Summer.

• Opportunities for Learning & Development programs.

• Provision of equipment, tools, and reimbursement support to foster a productive remote working environment.

• Complimentary Life360 Platinum Membership for your chosen circle.

• Free Tile Products.

People also viewed

Investigo9 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind9 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal9 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job
Avaya9 hours ago

Site Reliability Engineer – Azure, DevSecOps, IaC, Governance, Observability

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$129k – $143k/year
ApplyView job
Agilent Technologies9 hours ago

DevOps Engineer – Platform, AWS, CI/CD

US flagColorado OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$143.8k – $224.6k/year
ApplyView job
Dropbox9 hours ago

Site Reliability Engineer

PL flagPoland OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers