Lead Reliability Engineer
4 days ago
Celestial AI is a pioneering company in the field of Generative AI, data center infrastructure, and high-performance computing. As we navigate the era of Accelerated Computing, we recognize that data center bottlenecks are no longer limited to compute performance, but rather the system's interconnect bandwidth, memory bandwidth, and memory capacity.
Our Photonic Fabric technology offers a 10X increase in performance and energy efficiency over competitive technologies, making it an ideal solution for our customers' AI accelerators and GPUs. With our technology, customers can seamlessly integrate high-bandwidth, low-power, and low-latency optical interfaces into their systems.
Job DescriptionWe are seeking a highly skilled Lead Reliability Engineer to spearhead reliability efforts specifically tailored for datacenter and high-performance computing applications. The ideal candidate will have a strong background in reliability engineering, with a focus on these critical environments, ensuring the robustness and uptime of our systems in demanding operational scenarios.
Key Responsibilities- Develop and implement reliability strategies, standards, and processes customized for datacenter and high-performance computing applications.
- Lead reliability testing and qualification activities tailored for datacenter and HPC environments.
- Collaborate closely with cross-functional teams to integrate reliability considerations into product development and deployment processes.
- Conduct thorough reliability analyses specific to datacenter and HPC applications.
- Define reliability requirements and specifications for new products targeting datacenter and HPC markets.
- Lead root cause analysis and corrective actions for reliability issues identified in datacenter and HPC environments.
- Bachelor's degree in Engineering or related field; Master's or PhD degree preferred.
- 15+ years of experience in reliability engineering, with a focus on datacenter and high-performance computing applications.
- Strong understanding of reliability principles, methodologies, and tools relevant to datacenter and HPC environments.
- Experience working with industry standards and guidelines specific to datacenter and HPC reliability.
- Proven ability to lead cross-functional teams and drive reliability initiatives in fast-paced environments.
Celestial AI offers a highly competitive total compensation package, inclusive of a competitive base salary and a generous grant of our valuable early-stage equity. We also offer great benefits, a collaborative and continuous learning work environment, and the opportunity to work with smart and dedicated people engaged in developing the next generation architecture for high-performance computing.
Celestial AI is proud to be an equal opportunity workplace and is an affirmative action employer.
-
Lead Reliability Engineer
2 weeks ago
Santa Clara, California, United States Celestial AI Full timeAbout Celestial AICelestial AI is a pioneering company in the field of artificial intelligence, striving to push the boundaries of innovation and performance. As the industry grapples with the challenges of AI workloads, we are committed to delivering cutting-edge solutions that address the 'Memory Wall' problem and enable unprecedented scalability and...
-
Reliability Engineer
3 days ago
Santa Clara, California, United States Omni Vision Inc Full timeJob Title: Sr. Reliability EngineerOmni Vision Inc is seeking a highly skilled Sr. Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the quality and reliability of our CMOS Image Sensor products.Key Responsibilities:Review reliability qualification testing results and determine whether our...
-
Reliability Engineer
3 weeks ago
Santa Clara, California, United States Omni Vision Inc Full timeJob Title: Reliability EngineerOmni Vision Inc is seeking a highly skilled Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing and debugging hardware for biased product reliability evaluation, evaluating and qualifying CMOS Imaging Sensor (CIS) products for mass production, and collaborating...
-
Hardware Reliability Engineer
3 days ago
Santa Clara, California, United States Amazon Full timeJob DescriptionWe are seeking a highly skilled Hardware Reliability Engineer to join our team at Amazon Web Services (AWS). As a key member of our Hardware Engineering team, you will play a critical role in designing and developing cutting-edge compute and storage platforms that enable our cloud services.The successful candidate will have a strong background...
-
Director of Reliability Engineering
3 days ago
Santa Clara, California, United States Ushur Full timeAbout UshurUshur is a leading provider of Customer Experience Automation solutions, empowering enterprises to deliver delightful customer and employee experiences. Our cutting-edge technologies, including Conversational AI, Machine Learning, and Intelligent Process Automation, enable Fortune 100 companies to automate their customer engagement.The RoleWe are...
-
Reliability Engineer
2 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly motivated and experienced Reliability Engineer to join our team. As a key member of our Hardware Quality and Compliance Engineering team, you will play a critical role in ensuring the quality and reliability of our new products from inception through the first year in production.Key...
-
Reliability Engineer
3 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly motivated and experienced Reliability Engineer to join our team. As a key member of our Hardware Quality and Compliance Engineering team, you will play a critical role in ensuring the quality and reliability of our new products from inception through the first year in production.You will be responsible for...
-
Reliability Engineer
1 week ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly motivated and experienced Reliability Engineer to join our team. As a key member of our Hardware Quality and Compliance Engineering team, you will play a critical role in ensuring the quality and reliability of our new products from inception through the first year in production.Key...
-
Reliability Engineer
3 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeJob Title: Principal NPI Reliability EngineerPalo Alto Networks is seeking an experienced and highly motivated Reliability Engineer to join our team. The successful candidate will take ownership and drive quality and reliability into the company's new products from inception through the first year in production.Key Responsibilities:Establish and maintain...
-
Reliability Engineer
4 days ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly motivated and experienced Reliability Engineer to join our team. As a key member of our Hardware Quality and Compliance Engineering team, you will play a critical role in ensuring the quality and reliability of our new products from inception through the first year in production.Key...
-
Reliability Engineer
4 days ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly motivated and experienced Reliability Engineer to join our team. As a key member of our World-Wide Operations team, you will play a critical role in ensuring the quality and reliability of our new products from inception through the first year in production.Key ResponsibilitiesEstablish and maintain...
-
Reliability Engineer
3 weeks ago
Santa Clara, California, United States Comtech Full timeReliability/Failure Analysis EngineerComtech Telecommunications Corp. is seeking a skilled Reliability/Failure Analysis Engineer to join our team in Santa Clara, CA. As a key member of our technical team, you will collaborate with diverse professionals and interact with customers to provide solutions to technical problems of moderate scope and...
-
Senior System Reliability Engineer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeReliability Engineer for NVIDIA's System ProductsNVIDIA is a leader in the field of artificial intelligence and high-performance computing, and we're looking for a skilled Reliability Engineer to join our team. As a Reliability Engineer, you will be responsible for ensuring the reliability of our system products, including graphics cards, servers, and data...
-
Reliability Engineer
3 weeks ago
Santa Clara, California, United States Innova Solutions Full timeJob Title: Reliability EngineerInnova Solutions is seeking a highly skilled Reliability Engineer to join our team. As a Reliability Engineer, you will be responsible for ensuring the reliability and quality of our products.Key Responsibilities:Work in the Board Level Reliability lab environment and setup functional test hardware and software for various NV...
-
Reliability Engineer
4 weeks ago
Santa Clara, California, United States Innova Solutions Full timeJob Title: Reliability EngineerInnova Solutions is seeking a skilled Reliability Engineer to join our team. As a Reliability Engineer, you will be responsible for ensuring the reliability and quality of our products.Key Responsibilities:Work in the Board Level Reliability lab environment and setup functional test hardware and software for various NV...
-
Site Reliability Engineer
3 weeks ago
Santa Clara, California, United States Cryptoware Technologies Inc Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Cryptoware Technologies Inc. As a Site Reliability Engineer, you will be responsible for leading the effort of global expansion of Huobi globe-spanning infrastructure.Key Responsibilities:Lead the effort of global expansion of Huobi...
-
Reliability Engineer
4 weeks ago
Santa Clara, California, United States Innova Solutions Full timeJob Title: Reliability EngineerWe are seeking a highly skilled Reliability Engineer to join our team at Innova Solutions. As a Reliability Engineer, you will play a critical role in ensuring the reliability and quality of our products.Key Responsibilities:Design and implement test plans and procedures to evaluate the reliability of our products.Conduct...
-
Reliability Engineer
3 days ago
Santa Clara, California, United States Comtech Full timeComtech Telecommunications Corp. is seeking a highly skilled Reliability/Failure Analysis Engineer to join our team in Santa Clara, CA. In this critical role, you will collaborate with a diverse team of technical professionals and interact with outside customers to provide solutions to a variety of technical problems of moderate scope and complexity.Key...
-
Principal Site Reliability Engineer
3 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.Key ResponsibilitiesContribute to the success of SRE and DevOps teamsDevelop expertise in new...
-
Senior Product Reliability Engineer
3 weeks ago
Santa Clara, California, United States Anello Photonics Full timeAbout Anello PhotonicsAnello Photonics is a leading-edge technology company based in Santa Clara, CA. We have developed integrated photonic system-on-chip technology for next-generation navigation. Our SIPHOGTM gyroscope is based on our patented photonic integrated circuit technology.This innovative technology enables a product that is higher performance,...