
RMA Systems Engineer
Posted Jun 19

Posted Jun 19
This is a fully remote position, open to applicants in Alabama, +32 more states.
• Evaluate GPU/CPU hardware failures by analyzing system logs, performance metrics, firmware behavior, and workload interactions to ascertain the root cause.
• Conduct advanced troubleshooting across hardware, firmware, and operating system layers to confirm faults and ascertain suitable remediation strategies.
• Act as a technical liaison with hardware vendors, delivering comprehensive diagnostic information, interpreting vendor feedback, and influencing resolution strategies.
• Identify and organize on-site hardware remediation actions based on technical assessment, system impact, and priority, including directing vendors and internal teams on suitable repair or replacement measures.
• Assess software and firmware versions to uncover compatibility issues or factors contributing to system failures.
• Validate and ensure system stability and readiness before returning hardware to production environments.
• Detect recurring failure trends and propose enhancements to RMA workflows, vendor processes, and hardware lifecycle management.
• Develop and sustain detailed technical documentation, encompassing failure analysis, troubleshooting techniques, and resolution outcomes.
• Oversee and contribute to technical case documentation within vendor portals, ensuring precision, completeness, and consistency with diagnostic results.
• Proven experience with Jira for managing, tracking, and documenting technical issues, RMA cases, and vendor escalations, while maintaining accurate and audit-ready case records.
• Demonstrated ability to diagnose and troubleshoot complex hardware and system-level challenges in a data center or infrastructure setting.
• Knowledge in RMA workflows, hardware lifecycle management, or infrastructure support.
• Practical experience with NVIDIA and/or AMD GPU technologies.
• Familiarity with analyzing system logs, firmware behavior, and performance data to identify the root cause.
• Proven experience collaborating with hardware vendors and managing technical escalations or support cases.
• Strong grasp of server hardware, system architecture, and data center operations.
• Capability to analyze technical issues, exercise judgment, and determine suitable resolution paths.
• Excellent written and verbal communication skills, especially in documenting technical findings and working with cross-functional teams.
• Proficient with tools such as JIRA, Confluence, and vendor support portals.
• Experience working within microcloud or distributed infrastructure environments, with an understanding of system architecture and hardware integration.
• Background in supporting or analyzing systems in data center environments, focusing on hardware lifecycle, performance, and reliability considerations.
• 100% company-covered insurance premiums for employee medical, dental, and vision plans.
• 401(k) plan with a 100% match up to 4%, featuring immediate vesting.
• Annual Professional Development Reimbursement of $2,500.
• 11 Holidays + Paid Time Off Accrual + Rollover Plan.
• Commitment is valued at Vultr! Increased PTO at the 3-year and 10-year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year.
• $500 stipend for remote office setup in the first year + $400 each subsequent year.
• Internet reimbursement of up to $75 per month.
• Gym membership reimbursement of up to $50 per month.
• Company-paid Wellable subscription.
Jellyfish
ScalableOS
Pragmatike
Get handpicked remote jobs straight to your inbox weekly.