Remotery

Incident Response Engineer – Facility Operations Center

Posted 6 days ago

This is a fully remote position, open to applicants in India.

📋 Description

• The primary responsibility is to facilitate coordination and communication across NVIDIA’s datacenter portfolio from an operational standpoint concerning incidents, maintenance, and reporting/monitoring.

• Establish standards and programs to support reliability and operations initiatives, including Problem and Change Control, and define and maintain a health score for sites and environments, while developing testing methods to predict and isolate points of failure, evaluating and advising on maintenance strategies, and providing related reporting and metrics.

• Analyze failure data and collaborate with machine learning and AI teams and tools to forecast future failures, while assisting in reliability studies such as critical assessments, RAM models, and RCM studies.

• Identify and implement automation and process improvement opportunities throughout catalog quality workflows and reporting.

• Coordinate disaster recovery testing, engage during audits, collaborate with internal partners, and make essential progress to guarantee business continuity and compliance.

• Conduct risk assessments to ensure adherence to policies, procedures, rules & regulations, and data center standards.

• Own and present comprehensive key business metrics related to incident response, including the ownership and representation of internal and external tools.

• Lead root cause analysis for outages and modify documentation, workflows, and operating procedures to prevent future incidents.

• Evaluate process improvement and transformation opportunities and partner with process owners and collaborators to define project scopes, problem statements, objectives, and structure teams.

• Collaborate cross-functionally with team members and groups within the organization, building robust, productive relationships across peer organizations that advance the organization’s business objectives.


⛳️ Requirements

• A Bachelor’s degree in a related field (e.g., Electrical Engineering, Mechanical Engineering, Industrial Engineering, Computer Engineering, Telecommunication Engineering, Computer Science, or a business-related discipline) or equivalent experience.

• Over 5 years of operations or environmental, health, and safety experience in data centers.

• Skilled in developing and driving reliability activities (modeling predictions, life cycle testing, stress testing, etc.).

• Strong commercial and financial awareness, with a comprehensive understanding of the impact of failures on business costs, production targets, and customer order fulfillment.

• Highly developed numeracy, statistical, and reporting skills; adept at analyzing, interpreting, and applying information, data, and trends.

• Motivated to achieve objectives and maintain organization, capable of strategizing to meet established goals.

• Proven ability to be meticulous, organized, and capable of synthesizing data analyses for presentation to large audiences.

• Proficient in utilizing asset databases and DCIM solutions to extract data and generate meaningful insights.

• Experience in designing, deploying, or maintaining large-scale datacenter infrastructure (whether ACSMEP or networking) or the capability to create strategic infrastructure roadmaps that include on-premise, hybrid, and cloud technologies.

• Demonstrated knowledge and advanced proficiency in using Microsoft Office Suite and G-Suite software.


🏝️ Benefits

• Health insurance

• Professional development opportunities

People also viewed

CEA10 hours ago

Information Security Assistant

Anywhere in the WorldFull-timeCybersecurity / Security Engineer
ApplyView job
GXA10 hours ago

Senior Security Engineer

PK flagPakistan OnlyFreelanceCybersecurity / Security Engineer
ApplyView job
Aras Corporation10 hours ago

Product Security Engineer

PL flagPoland OnlyFull-timeCybersecurity / Security Engineer
ApplyView job
Hopper1 day ago

Senior Security Engineer

ES flagSpain OnlyFull-timeCybersecurity / Security Engineer
ApplyView job
FCamara Consulting & Training1 day ago

Senior Cybersecurity Analyst – Blue Team, Vulnerability Management

BR flagBrazil OnlyFull-timeCybersecurity / Security Engineer
ApplyView job
The Quality Group1 day ago

AI Security Engineer

DE flagGermany OnlyFull-timeCybersecurity / Security Engineer
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers