
Site Reliability Engineer
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in United States.
• Leverage your expertise in enterprise-level triage and incident resolution while gaining valuable experience in VA system infrastructure.
• Employ contemporary system monitoring tools to enhance VA enterprise reliability and elevate the quality of services provided to veterans.
• Collaborate with system and application owners to understand existing designs and functionalities, utilizing your knowledge of workflow systems and application processes across multiple system environments, while working with technology and development teams to diagnose outages and propose changes for increased reliability.
• Apply your hardware and software expertise to fortify the systems relied upon by the VA. Your primary focus will be on investigations, collaborating with event management, application owners, DevOps teams, and system and network administrators to analyze issues across enterprise applications and technology stacks.
• Partner with system and application owners to gain insights into their platform designs and operational dynamics across diverse environments. This understanding will assist you in diagnosing outages, tracing workflow challenges, and suggesting enhancements for stability.
• Work together with developers and identity and access teams when in-depth technical investigations are required.
• You will acquire hands-on experience with enterprise-level triage and incident analysis, enriching your understanding of the VA’s infrastructure. Tools such as SolarWinds, Dynatrace, and Splunk will be integral to your daily tasks, providing the visibility necessary to identify reliability issues and support improvements to the services offered to veterans.
• Proven expertise (3+ years) in two or more of the following tools utilized for troubleshooting application logging in an enterprise environment: Dynatrace, Splunk, SolarWinds, ServiceNow Operator Workspace.
• Extensive knowledge in one or more Technology Areas, including Network, Windows, Desktop, Unix/Linux, AWS or Azure Cloud, WebSphere Middleware, Java/JS Development, and Microsoft or Oracle Database.
• Over 8 years of experience working with key performance indicators for IT system operability, reliability, application performance, and code quality.
• More than 8 years of experience in deploying, maintaining, and troubleshooting complex applications at an enterprise scale, while collaborating with cross-functional teams.
• At least 1 year of experience in service virtualization, as well as in AWS or Azure Cloud technologies, including SaaS and PaaS implementation.
• Proficiency in using Microsoft Office applications, including Word, Excel, and PowerPoint.
• Over 2 years of experience independently leading a team to tackle challenging technical issues.
• High school diploma or GED with 20+ years of relevant professional experience, or a MA or MS degree in computer science, electronics engineering, or a related technical discipline, accompanied by 10+ years of relevant professional experience.
• Health insurance
• Retirement plans
• Paid time off
• Flexible work arrangements
• Professional development
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.