Lead Reliability Engineer

1 month ago


Santa Clara, California, United States Celestial AI Full time
About Celestial AI

At Celestial AI, we are at the forefront of innovation in AI systems. Our ground-breaking Photonic Fabric technology provides a scalable solution to data transfer bottlenecks, revolutionizing AI system performance and delivering unmatched efficiency.

Lead Reliability Engineer

We are seeking a dynamic Lead Reliability Engineer to drive reliability efforts for datacenter and high-performance computing applications at Celestial AI. This pivotal role involves ensuring the robustness and uptime of our systems in demanding operational scenarios.

Key Responsibilities
  • Develop and implement tailored reliability strategies, standards, and processes for datacenter and HPC applications.
  • Lead reliability testing activities, including stress testing and performance degradation analysis.
  • Collaborate with cross-functional teams to integrate reliability considerations into product development processes.
  • Conduct thorough reliability analyses specific to datacenter and HPC applications.
  • Define reliability requirements for new products targeting datacenter and HPC markets.
  • Lead root cause analysis and corrective actions for reliability issues in datacenter and HPC environments.
  • Stay updated on emerging technologies and industry trends to enhance system reliability and performance.
Requirements
  • Bachelor's degree in Engineering or related field; Master's or PhD preferred.
  • 15+ years of experience in reliability engineering for datacenter and high-performance computing.
  • Strong understanding of reliability principles and methodologies.
  • Experience with industry standards and guidelines specific to datacenter and HPC reliability.
  • Proven leadership skills and ability to drive reliability initiatives.
  • Excellent problem-solving and communication skills.
Location
Preferably located in the Bay Area.

Join us at Celestial AI, where innovation meets opportunity. We offer competitive compensation, a collaborative work environment, and the chance to be part of a team shaping the future of high-performance computing.

Celestial AI Inc. is committed to diversity and is an equal opportunity employer.

  • Santa Clara, California, United States Palo Alto Networks Full time

    Job OverviewCompany OverviewTo comply with U.S. federal government requirements, U.S. citizenship is required for this position.Our MissionAt Palo Alto Networks, our mission is clear:To be the cybersecurity partner of choice, safeguarding our digital existence.We envision a world where each day is safer and more secure than the last. Our foundation is built...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Company OverviewOur VisionAt Palo Alto Networks, our mission is clear: To be the preferred cybersecurity partner, safeguarding our digital lives. We envision a future where each day is safer and more secure than the last. Our foundation is built on challenging the status quo and innovating the cybersecurity landscape. We seek forward-thinkers who are...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA has been at the forefront of technological innovation since the introduction of the GPU in 1999, which not only transformed the PC gaming landscape but also redefined modern graphics and parallel computing. Recently, the advent of GPU deep learning has propelled us into a new era of computing, positioning the GPU as the central processing unit for...


  • Santa Clara, California, United States Anello Full time

    About Anello Photonics:ANELLO Photonics is a leading-edge technology company based in Santa Clara, CA. The company has developed integrated photonic system-on-chip technology for next generation navigation. ANELLO's SIPHOGTM gyroscope is based on its patented photonic integrated circuit technology. The result is a product that is higher performance, much...


  • Santa Clara, California, United States Omnivision Technologies Full time

    Qualifications:Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also beneficial.Key...


  • Santa Clara, California, United States OMNIVISION Full time

    Job Overview We are seeking a Staff Reliability Engineer to join our team at OMNIVISION. The ideal candidate will possess a strong educational background and relevant experience in the field of reliability engineering. Qualifications: A Bachelor’s degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with...


  • Santa Clara, California, United States Omnivision Technologies Full time

    Qualifications:A Bachelor’s degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework in semiconductor physics and electronics is required. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also...


  • Santa Clara, California, United States Omnivision Technologies Full time

    Qualifications:A Bachelor’s degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronics is required. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also...


  • Santa Clara, California, United States Omnivision Technologies Full time

    Qualifications:Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC/AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also beneficial.Key...


  • Santa Clara, California, United States Omnivision Technologies Full time

    Qualifications:Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also beneficial.Key...


  • Santa Clara, California, United States Omnivision Technologies Full time

    Qualifications:Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronic systems. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also...


  • Santa Clara, California, United States OMNIVISION Full time

    Job Overview Experience: A Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field is required, with coursework focused on semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability is...


  • Santa Clara, California, United States Omni Vision Inc Full time

    Experience: A Bachelor’s degree in Physics, Electrical Engineering, Materials Science, or a related engineering field is required, with coursework that includes semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability is also...


  • Santa Clara, California, United States Nvidia Full time

    NVIDIA, a prominent player in the realms of Artificial Intelligence, High-Performance Computing, and Visualization, is on the lookout for a Lead Site Reliability Engineer specializing in HPC storage systems. This role involves collaborating with our team to architect, implement, and enhance on-premises HPC storage solutions while integrating cloud...


  • Santa Clara, California, United States OMNIVISION Full time

    Position Overview We are seeking a Staff Reliability Engineer to join our team at OMNIVISION. The ideal candidate will possess a strong educational background and relevant experience in the field of reliability engineering, particularly in semiconductor technologies. Qualifications: A Bachelor’s degree in Physics, Electrical Engineering,...


  • Santa Clara, California, United States Wipro Full time

    Position: Reliability Test EngineerCompany: WiproOverview:• Engage in the Board Level Reliability laboratory setting, establishing functional testing hardware and software for a variety of products, including extensive server systems, while executing diverse functional assessments for GPU/Tegra products;• Develop scripts for automated testing...

  • Reliability Engineer

    2 weeks ago


    Santa Clara, California, United States Innova Solutions Full time

    Innova Solutions is immediately hiring a Reliability EngineerPosition type: Full Time Duration: Full Time Location: Santa Clara, CAAs a Reliability Engineer, you will:Minimum Qualifications: EE education is must + board level debugging exp is mustWork in the Board Level Reliability lab environment and setup functional test hardware and software for various...


  • Santa Clara, California, United States Johnson & Johnson Full time

    Job SummaryWe are seeking a highly skilled Staff Reliability Engineer - Electrical to join our team at Johnson & Johnson Medical Devices Companies. As a key member of our Hardware Team, you will play a critical role in designing and developing the next generation of robotic platforms.Key ResponsibilitiesReliability Strategy Implementation: Plan and...


  • Santa Clara, California, United States Innova Solutions Full time

    Innova Solutions is actively seeking a Reliability Engineer. Position Type: Full Time Location: Santa Clara, CA As a Reliability Engineer, your responsibilities will include: Key Responsibilities:Engaging in Board Level Reliability laboratory activities, establishing functional test hardware and software for various NV products, including large server...


  • Santa Clara, California, United States Promote Project Full time

    About the Company: Promote Project is at the forefront of innovation, leveraging cutting-edge technology to redefine the landscape of AI and computing. Our mission is to harness the power of advanced computing to create transformative solutions that impact various industries.Position Overview: We are seeking a Manager of Site Reliability Engineering to...