
Principal Customer Reliability Engineer
Posted 1 hour ago

Posted 1 hour ago
This is a fully remote position, open to applicants in United States.
• Act as the technical representative for Accela's SaaS Operations organization, collaborating with Engineering, SRE, Database Engineering, Product, Professional Services, and Support teams to drive customer success.
• Oversee technical engagements concerning SaaS implementations, migrations, and ongoing production operations, ensuring consistent and reliable outcomes for clients.
• Collaborate with Professional Services and Support teams to ascertain technical requirements, establish migration strategies, and ensure successful transitions into steady-state operations.
• Enhance and develop operational processes, tools, monitoring, metrics, and alerting capabilities that facilitate customer onboarding, migrations, and maintain platform reliability.
• Serve as a senior escalation point for intricate customer issues, utilizing observability tools, application performance monitoring, log analysis, distributed tracing, and metrics to troubleshoot and resolve production challenges.
• Direct cross-functional response efforts for critical customer-impacting incidents and implementation hurdles, coordinating stakeholders to ensure prompt resolution.
• Collaborate with customers to set reliability expectations, communicate service-level commitments, and offer guidance on operational risk and change management practices.
• Work alongside Sales, Professional Services, and Support teams to articulate Accela's service management processes, cloud operations methodologies, and compliance stance, including SOC 2, HIPAA, FedRAMP, StateRAMP, and PCI-DSS requirements.
• Share customer-driven insights and feedback with Product, Engineering, and SRE teams to enhance platform reliability, usability, and operational performance.
• Assist in pre-sales and customer expansion efforts by offering technical expertise regarding reliability, architecture, cloud operations, and compliance.
• Provide technical leadership, mentorship, and best-practice guidance to Customer Reliability Engineers, Site Reliability Engineers, and other technical teams.
• Over 8 years of experience in Production Engineering, Site Reliability Engineering, Cloud Operations, Technical Support Engineering, or related SaaS environments, including customer-facing or escalation leadership roles.
• Strong customer-centric approach and proven ability to communicate effectively with both technical and business stakeholders.
• Practical experience operating and supporting SaaS platforms on Microsoft Azure.
• Familiarity with Kubernetes and contemporary containerized environments.
• Extensive experience utilizing observability and monitoring tools, including APM platforms, distributed tracing, logging, and metrics solutions.
• Profound troubleshooting and Root Cause Analysis skills across application, infrastructure, networking, operating system, and database layers.
• Understanding of Infrastructure-as-Code concepts and tools, especially Terraform.
• Experience in developing automation and operational tools using Python, PowerShell, Bash, or similar scripting languages.
• Proven ability to lead Incident, Problem, and Change Management processes during high-severity customer escalations.
• Exceptional written and verbal communication skills, with experience presenting technical information to customer leadership and executive stakeholders.
• Proficiency in using Git and GitHub-based workflows.
• Flexible time off
• Comprehensive medical, dental, and vision plans
• Family planning benefits
• 401(k) retirement savings plan with company match
• Health savings account with company contributions
• Flexible spending account
• Life, accident, and disability coverage
• Business travel insurance
• Employee assistance programs
• Other well-being benefits
PhoenixTeam
Grafana Labs
Pragmatike
Careflow
Get handpicked remote jobs straight to your inbox weekly.