← Back to opportunities
About the Role
We are looking for a highly motivated and skilled Incident Response Engineer to join our Facility Operations Center (FOC) team. In this critical role, we are responsible for coordination and presentation within NVIDIA’s datacenters, with a specific focus on incident response, vendor support, and maintenance performance. You will be instrumental in ensuring the reliability and availability of our datacenter environments and minimizing the blast radius of incidents.
What you'll be doing:
+ Primary role is to perform coordination and communication across NVIDIA’s datacenter portfolio from an operations perspective regarding incidents, maintenance, and reporting/monitoring.
+ Develop standards and programs in support of reliability and operations initiatives, including Problem and Change Control, and define and maintain a health score for sites and environments, including testing methods to predict and isolate points of failure, assessing and advising on maintenance strategies, an...
What you'll be doing:
+ Primary role is to perform coordination and communication across NVIDIA’s datacenter portfolio from an operations perspective regarding incidents, maintenance, and reporting/monitoring.
+ Develop standards and programs in support of reliability and operations initiatives, including Problem and Change Control, and define and maintain a health score for sites and environments, including testing methods to predict and isolate points of failure, assessing and advising on maintenance strategies, an...
Ready to Join Through a Referral?
Apply now and get connected directly with the hiring team
Apply for this Position