
Senior Site Reliability Engineer
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in United States.
• The Senior Site Reliability Engineer (SRE) will be responsible for ensuring the operational stability and performance of Juul’s hybrid cloud infrastructure (Nutanix, AWS/GCP).
• This role includes spearheading automation initiatives, designing for reliability, and serving as the primary escalation point for critical incidents to guarantee a scalable and efficient platform.
• Responsible for designing, deploying, and maintaining enterprise-scale Nutanix AHV clusters and Prism Central for managing multiple clusters.
• Requires expert-level proficiency with Nutanix CLI (nCLI and acli) for advanced operations, troubleshooting, and automation tasks.
• Tasked with developing automation scripts utilizing Nutanix REST APIs, Python SDK, PowerShell, and Terraform for infrastructure-as-code implementation.
• Create and oversee VM templates, golden images, and standardized deployment catalogs to ensure consistent provisioning.
• Design disaster recovery solutions employing Leap, Protection Domains, cross-cluster replication, and metro clustering techniques.
• Implement network micro-segmentation using Nutanix Flow and configure RBAC, encryption, and security hardening measures.
• Lead L3 troubleshooting through advanced diagnostics, log analysis (CVM, Genesis), NCC health checks, and cluster service resolution.
• Responsible for configuring high availability, VM affinity rules, QoS policies, and optimizing performance for mission-critical workloads.
• Manage AHV networking including OVS bridges, VLANs, bonds, LACP, and implement resource reservations and workload balancing.
• Design, deploy, and maintain hybrid cloud infrastructure across Nutanix HCI, AWS, and GCP platforms.
• Architect and implement multi-cloud solutions that ensure high availability, scalability, and effective disaster recovery.
• 8-12+ years of infrastructure experience, with a minimum of 8+ years specifically in Nutanix HCI and enterprise cloud (AWS/GCP).
• Expert-level proficiency in Python, PowerShell, Bash scripting, infrastructure-as-code (Terraform/CloudFormation), and container orchestration (Kubernetes, EKS/GKE).
• Proven track record in managing enterprise-scale environments, executing hybrid cloud migrations, overseeing disaster recovery, and handling L3 critical incident management.
• Strong knowledge of networking (TCP/IP, VLANs, routing, VPN), security hardening practices, and compliance frameworks (ITIL).
• A strategic thinker with exceptional analytical skills and troubleshooting capabilities for complex multi-layer infrastructure challenges.
• Excellent communication abilities to effectively convey technical concepts to both executives and non-technical stakeholders.
• Remains composed under pressure during critical outages while maintaining meticulous attention to security, compliance, and configuration management.
• A self-driven continuous learner dedicated to keeping up with evolving cloud technologies and automation advancements.
• Availability for on-call rotations, coupled with strong documentation proficiency and a customer service-oriented mindset.
• Preferred certifications: Nutanix NCP/NCAP, AWS Solutions Architect Professional, AWS DevOps Professional, GCP Professional Cloud Architect, Terraform.
• People. Collaborate with talented, dedicated, and supportive colleagues.
• Equity and performance bonuses. Every employee participates in our collective success.
• Benefits include a cell phone subsidy, commuter assistance, and discounts on JUUL products.
• Comprehensive medical, dental, and vision insurance, along with disability, life insurance, family support, wellness, legal, and employee assistance program benefits.
• 401(k) plan with company matching contributions.
• Additionally, biannual discretionary performance bonuses.
Urrly
Weiler Abrasives Group
Abbott
Segoso
Get handpicked remote jobs straight to your inbox weekly.