Lead Reliability Engineer
1 month ago
At Celestial AI, we are at the forefront of innovation in AI systems. Our ground-breaking Photonic Fabric technology provides a scalable solution to data transfer bottlenecks, revolutionizing AI system performance and delivering unmatched efficiency.
Lead Reliability Engineer
We are seeking a dynamic Lead Reliability Engineer to drive reliability efforts for datacenter and high-performance computing applications at Celestial AI. This pivotal role involves ensuring the robustness and uptime of our systems in demanding operational scenarios.
Key Responsibilities
- Develop and implement tailored reliability strategies, standards, and processes for datacenter and HPC applications.
- Lead reliability testing activities, including stress testing and performance degradation analysis.
- Collaborate with cross-functional teams to integrate reliability considerations into product development processes.
- Conduct thorough reliability analyses specific to datacenter and HPC applications.
- Define reliability requirements for new products targeting datacenter and HPC markets.
- Lead root cause analysis and corrective actions for reliability issues in datacenter and HPC environments.
- Stay updated on emerging technologies and industry trends to enhance system reliability and performance.
- Bachelor's degree in Engineering or related field; Master's or PhD preferred.
- 15+ years of experience in reliability engineering for datacenter and high-performance computing.
- Strong understanding of reliability principles and methodologies.
- Experience with industry standards and guidelines specific to datacenter and HPC reliability.
- Proven leadership skills and ability to drive reliability initiatives.
- Excellent problem-solving and communication skills.
Preferably located in the Bay Area.
Join us at Celestial AI, where innovation meets opportunity. We offer competitive compensation, a collaborative work environment, and the chance to be part of a team shaping the future of high-performance computing.
Celestial AI Inc. is committed to diversity and is an equal opportunity employer.
-
Lead Site Reliability Engineer
20 hours ago
Santa Clara, California, United States Palo Alto Networks Full timeJob OverviewCompany OverviewTo comply with U.S. federal government requirements, U.S. citizenship is required for this position.Our MissionAt Palo Alto Networks, our mission is clear:To be the cybersecurity partner of choice, safeguarding our digital existence.We envision a world where each day is safer and more secure than the last. Our foundation is built...
-
Lead NPI Reliability Engineer
6 days ago
Santa Clara, California, United States Palo Alto Networks Full timeCompany OverviewOur VisionAt Palo Alto Networks, our mission is clear: To be the preferred cybersecurity partner, safeguarding our digital lives. We envision a future where each day is safer and more secure than the last. Our foundation is built on challenging the status quo and innovating the cybersecurity landscape. We seek forward-thinkers who are...
-
Lead Systems Reliability Engineer
6 days ago
Santa Clara, California, United States NVIDIA Full timeNVIDIA has been at the forefront of technological innovation since the introduction of the GPU in 1999, which not only transformed the PC gaming landscape but also redefined modern graphics and parallel computing. Recently, the advent of GPU deep learning has propelled us into a new era of computing, positioning the GPU as the central processing unit for...
-
Senior Product Reliability Engineer
4 weeks ago
Santa Clara, California, United States Anello Full timeAbout Anello Photonics:ANELLO Photonics is a leading-edge technology company based in Santa Clara, CA. The company has developed integrated photonic system-on-chip technology for next generation navigation. ANELLO's SIPHOGTM gyroscope is based on its patented photonic integrated circuit technology. The result is a product that is higher performance, much...
-
Senior Reliability Engineer
6 days ago
Santa Clara, California, United States Omnivision Technologies Full timeQualifications:Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also beneficial.Key...
-
Principal Reliability Engineer
6 days ago
Santa Clara, California, United States OMNIVISION Full timeJob Overview We are seeking a Staff Reliability Engineer to join our team at OMNIVISION. The ideal candidate will possess a strong educational background and relevant experience in the field of reliability engineering. Qualifications: A Bachelor’s degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with...
-
Senior Reliability Engineer
6 days ago
Santa Clara, California, United States Omnivision Technologies Full timeQualifications:A Bachelor’s degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework in semiconductor physics and electronics is required. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also...
-
Senior Reliability Engineer
6 days ago
Santa Clara, California, United States Omnivision Technologies Full timeQualifications:A Bachelor’s degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronics is required. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also...
-
Senior Reliability Engineer
6 days ago
Santa Clara, California, United States Omnivision Technologies Full timeQualifications:Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC/AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also beneficial.Key...
-
Senior Reliability Engineer
6 days ago
Santa Clara, California, United States Omnivision Technologies Full timeQualifications:Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also beneficial.Key...
-
Senior Reliability Engineer
6 days ago
Santa Clara, California, United States Omnivision Technologies Full timeQualifications:Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronic systems. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also...
-
Senior Reliability Engineer
18 hours ago
Santa Clara, California, United States OMNIVISION Full timeJob Overview Experience: A Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field is required, with coursework focused on semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability is...
-
Reliability Assurance Engineer
23 hours ago
Santa Clara, California, United States Omni Vision Inc Full timeExperience: A Bachelor’s degree in Physics, Electrical Engineering, Materials Science, or a related engineering field is required, with coursework that includes semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability is also...
-
Santa Clara, California, United States Nvidia Full timeNVIDIA, a prominent player in the realms of Artificial Intelligence, High-Performance Computing, and Visualization, is on the lookout for a Lead Site Reliability Engineer specializing in HPC storage systems. This role involves collaborating with our team to architect, implement, and enhance on-premises HPC storage solutions while integrating cloud...
-
Santa Clara, California, United States OMNIVISION Full timePosition Overview We are seeking a Staff Reliability Engineer to join our team at OMNIVISION. The ideal candidate will possess a strong educational background and relevant experience in the field of reliability engineering, particularly in semiconductor technologies. Qualifications: A Bachelor’s degree in Physics, Electrical Engineering,...
-
Reliability Assurance Engineer
6 days ago
Santa Clara, California, United States Wipro Full timePosition: Reliability Test EngineerCompany: WiproOverview:• Engage in the Board Level Reliability laboratory setting, establishing functional testing hardware and software for a variety of products, including extensive server systems, while executing diverse functional assessments for GPU/Tegra products;• Develop scripts for automated testing...
-
Reliability Engineer
2 weeks ago
Santa Clara, California, United States Innova Solutions Full timeInnova Solutions is immediately hiring a Reliability EngineerPosition type: Full Time Duration: Full Time Location: Santa Clara, CAAs a Reliability Engineer, you will:Minimum Qualifications: EE education is must + board level debugging exp is mustWork in the Board Level Reliability lab environment and setup functional test hardware and software for various...
-
Staff Reliability Engineer
22 hours ago
Santa Clara, California, United States Johnson & Johnson Full timeJob SummaryWe are seeking a highly skilled Staff Reliability Engineer - Electrical to join our team at Johnson & Johnson Medical Devices Companies. As a key member of our Hardware Team, you will play a critical role in designing and developing the next generation of robotic platforms.Key ResponsibilitiesReliability Strategy Implementation: Plan and...
-
Senior Reliability Engineer
6 days ago
Santa Clara, California, United States Innova Solutions Full timeInnova Solutions is actively seeking a Reliability Engineer. Position Type: Full Time Location: Santa Clara, CA As a Reliability Engineer, your responsibilities will include: Key Responsibilities:Engaging in Board Level Reliability laboratory activities, establishing functional test hardware and software for various NV products, including large server...
-
Site Reliability Engineering Manager
6 days ago
Santa Clara, California, United States Promote Project Full timeAbout the Company: Promote Project is at the forefront of innovation, leveraging cutting-edge technology to redefine the landscape of AI and computing. Our mission is to harness the power of advanced computing to create transformative solutions that impact various industries.Position Overview: We are seeking a Manager of Site Reliability Engineering to...