Reliability, Availability and Serviceability Expert, Datacenter AI Products Development
Found in: beBee S US - 1 week ago
What you’ll be doing:
The focal point SME for manufacturing test requirements, test methodology, test plan and test flow for AI system RAS/Resilience features to ensure good test coverage and successful production ramp-ups.
Own the AI system RAS/Resilience models, Benchmarking and Risk assessment.
Own the troubleshooting and root-causing of AI system RAS/Resilience related failures at factory and in the field.
Drive the end-to-end RAS efforts of chip-board-system to reduce FIT rates.
Lead the data analysis of RAS/Resilience logs to refine, revise and overhaul test methodology and manufacturing flows; influence and drive software tools/infrastructure required for new product development, validation, and productization.
Opportunity to work closely and partner with architecture, hardware, software, and product engineering teams through the product development lifecycle.
Be ready to be challenged to assess new hardware features and architect manufacturing RAS tests, flows, methodologies.
You'll nurture a deep understanding of NVIDIA's AI hardware and software architecture.
What we need to see:
BS or higher in EE, CE, CS, Mathematics, or equivalent experience.
12+ years proven hands-on experiences in design, testing, benchmarking, and risk assessment of system RAS / Resiliency features of large Compute or AI or HPC systems.
Proficient in Compute System RAS/Resilience model theory and methodology.
Proficient in HPC or AI system architecture and Cluster Interconnect technologies.
Proficient in using test equipment, Linux commands and benchmark utilities to test and trouble-shoot compute system RAS & Resiliency features.
Strong problem-solving and trouble-shooting expertise; and institutionalizing root-cause analysis.
Self-initiative, strong interpersonal skills, and flexibility to adapt to new technologies.
Solid Knowledge and/or Experience in HPC or MLPerf benchmarking is a plus.
NVIDIA is widely considered to be one of the technology world’s most desirable employers We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you
The base salary range is 188,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
-
Reliability, Availability and Serviceability Expert, Datacenter AI Products Development
Found in: beBee jobs US - 2 days ago
Santa Clara, California, United States NVIDIA Full timeFor two decades, we have pioneered visual computing, the art and science of computer graphics - with our invention of the GPUs, the engine of modern AI technologies, the field has expanded to encompass AI-powered video games, social networking and web search, IC & other product design, medical diagnosis, and scientific research. Today, visual computing is...
-
Reliability, Availability and Serviceability Expert, Datacenter AI Products Development
Found in: Whatjobs US C2 - 2 days ago
Santa Clara, United States NVIDIA Full timeFor two decades, we have pioneered visual computing, the art and science of computer graphics - with our invention of the GPUs, the engine of modern AI technologies, the field has expanded to encompass AI-powered video games, social networking and web search, IC & other product design, medical diagnosis, and scientific research. Today, visual computing is...
-
Santa Clara, United States NVIDIA Full timeFor two decades, we have pioneered visual computing, the art and science of computer graphics - with our invention of the GPUs, the engine of modern AI technologies, the field has expanded to encompass AI-powered video games, social networking and web search, IC & other product design, medical diagnosis, and scientific research. Today, visual computing is...
-
Santa Clara, United States NVIDIA Full timeFor two decades, we have pioneered visual computing, the art and science of computer graphics - with our invention of the GPUs, the engine of modern AI technologies, the field has expanded to encompass AI-powered video games, social networking and web search, IC & other product design, medical diagnosis, and scientific research. Today, visual computing is...
-
Reliability, Availability and Serviceability Expert, Datacenter AI Products Development
Found in: Jooble US O C2 - 1 day ago
Santa Clara, CA, United States NVIDIA Full timeFor two decades, we have pioneered visual computing, the art and science of computer graphics - with our invention of the GPUs, the engine of modern AI technologies, the field has expanded to encompass AI-powered video games, social networking and web search, IC & other product design, medical diagnosis, and scientific research. Today, visual computing is...
-
Santa Clara, CA, United States NVIDIA Full timeFor two decades, we have pioneered visual computing, the art and science of computer graphics - with our invention of the GPUs, the engine of modern AI technologies, the field has expanded to encompass AI-powered video games, social networking and web search, IC & other product design, medical diagnosis, and scientific research. Today, visual computing is...
-
Product Manager
1 week ago
Santa Clara, United States Astera Labs Full timeAstera Labs is a global leader in purpose-built connectivity solutions that unlock the full potential of cloud and AI infrastructure. Our Intelligent Connectivity Platform integrates PCIe, CXL and Ethernet semiconductor-based solutions based on a software-defined architecture that is both scalable and customizable. Inspired by trusted partnerships with...
-
Senior Product Manager
Found in: Jooble US O C2 - 2 weeks ago
Santa Clara, CA, United States Astera Labs Full timeAstera Labs is a global leader in purpose-built connectivity solutions that unlock the full potential of cloud and AI infrastructure. Our Intelligent Connectivity Platform integrates PCIe, CXL and Ethernet semiconductor-based solutions based on a software-defined architecture that is both scalable and customizable. Inspired by trusted partnerships with...
-
Head of Product Marketing
1 week ago
Santa Clara, United States Resemble AI Full timeAbout The Company Resemble AI is at the forefront of voice AI technology, pioneering in the field of Custom Voices. Our advanced Deep Learning models enable us to produce highly realistic Speech Synthesis, marking a significant milestone in AI evolution. At Resemble AI, we are committed to pushing the boundaries of what's possible in AI-driven speech...
-
Head of Product Marketing
Found in: Jooble US O C2 - 2 weeks ago
Santa Clara, CA, United States Resemble AI Full timeAbout The Company Resemble AI is at the forefront of voice AI technology, pioneering in the field of Custom Voices. Our advanced Deep Learning models enable us to produce highly realistic Speech Synthesis, marking a significant milestone in AI evolution. At Resemble AI, we are committed to pushing the boundaries of what's possible in AI-driven speech...
-
Datacenter Technician
2 days ago
Santa Clara, United States TCWGlobal Full timeDatacenter Technician Santa Clara, CA 95050 (*local candidates- onsite)$65-75hr (Weekly pay + benefits)6 month contact (Excellent potential for extension or permanent)Full-time: M-F onsite Our client is a rapidly growing and innovative tech company. redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s an...
-
Datacenter Technician
Found in: Appcast Linkedin GBL C2 - 1 day ago
Santa Clara, United States TCWGlobal Full timeDatacenter Technician Santa Clara, CA 95050 (*local candidates- onsite)$65-75hr (Weekly pay + benefits)6 month contact (Excellent potential for extension or permanent)Full-time: M-F onsite Our client is a rapidly growing and innovative tech company. redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s an...
-
Datacenter Technician
Found in: Appcast US C2 - 1 day ago
Santa Clara, United States TCWGlobal Full timeDatacenter Technician Santa Clara, CA 95050 (*local candidates- onsite)$65-75hr (Weekly pay + benefits)6 month contact (Excellent potential for extension or permanent)Full-time: M-F onsite Our client is a rapidly growing and innovative tech company. redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s an...
-
Datacenter Technician
3 days ago
Santa Clara, United States TCWGlobal Full timeJob DescriptionJob DescriptionSystem AdministratorTechnician- Datacenter Santa Clara, CA 95050 (*local candidates- onsite)$65-75hr (Weekly pay + benefits)6 month contact (Excellent potential for extension or permanent)Full-time: M-F onsiteOur client is a rapidly growing and innovative tech company. redefining computer graphics, PC gaming, and accelerated...
-
Datacenter Technician
3 days ago
Santa Clara, United States TCWGlobal Full timeJob DescriptionJob DescriptionSystem AdministratorTechnician- Datacenter Santa Clara, CA 95050 (*local candidates- onsite)$65-75hr (Weekly pay + benefits)6 month contact (Excellent potential for extension or permanent)Full-time: M-F onsiteOur client is a rapidly growing and innovative tech company. redefining computer graphics, PC gaming, and accelerated...
-
Technical Product Marketing Manager
1 week ago
Santa Clara, United States NVIDIA Full timeTechnical Product Marketing Manager – AI and Game Development page is loaded Technical Product Marketing Manager – AI and Game Development Apply locations US, CA, Santa Clara time type Full time posted on Posted 3 Days Ago job requisition id JR1976790 The ACE for Gaming Product Management team is looking for a world class technical marketing manager to...
-
Datacenter Technician
Found in: Appcast Linkedin GBL C2 - 3 days ago
Santa Clara, United States TCWGlobal Full timeDataceter Technician Santa Clara, CA 95050 (*local candidates- onsite)$65-75hr (Weekly pay + benefits)6 month contact (Excellent potential for extension or permanent)Full-time: M-F onsite Our client is a rapidly growing and innovative tech company. redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s an...
-
Datacenter Technician
Found in: Appcast US C2 - 3 days ago
Santa Clara, United States TCWGlobal Full timeDataceter Technician Santa Clara, CA 95050 (*local candidates- onsite)$65-75hr (Weekly pay + benefits)6 month contact (Excellent potential for extension or permanent)Full-time: M-F onsite Our client is a rapidly growing and innovative tech company. redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s an...
-
Datacenter Technician
3 days ago
Santa Clara, United States TCWGlobal Full timeDataceter Technician Santa Clara, CA 95050 (*local candidates- onsite)$65-75hr (Weekly pay + benefits)6 month contact (Excellent potential for extension or permanent)Full-time: M-F onsite Our client is a rapidly growing and innovative tech company. redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s an...
-
Product Marketing Manager, Platform Marketing
Found in: Jooble US O C2 - 2 weeks ago
Santa Clara, CA, United States Resemble AI Full timeResemble AI is at the forefront of voice AI technology, pioneering in the field of Custom Voices. Our advanced Deep Learning models enable us to produce highly realistic Speech Synthesis, marking a significant milestone in AI evolution. At Resemble AI, we are committed to pushing the boundaries of what's possible in AI-driven speech technology. As the Head...