Senior Reliability Engineer

4 weeks ago


Santa Clara, California, United States NVIDIA Full time
Job Title: Senior Reliability Engineer

NVIDIA is a leader in the field of computer graphics, PC gaming, and accelerated computing. We are seeking a highly skilled Senior Reliability Engineer to join our team.

Job Summary:

We are looking for a talented individual with expertise in HTOL stress testing, JEDEC standards, and thermal management techniques. The successful candidate will be responsible for developing, debugging, and managing test programs for the HTOL oven, reviewing and designing HTOL board schematics, and collaborating with multi-functional teams to debug and resolve hardware/software product issues.

Key Responsibilities:
  • Developing, debugging, and managing test programs for the HTOL oven
  • Reviewing and designing HTOL board schematics for various ovens
  • Diagnosing signal integrity issues on HTOL boards with complex vectors and test patterns
  • Collaborating with socket vendors to develop intricate socket designs and unique cooling solutions
  • Leading and providing technical guidance to lab technicians and various engineering groups to ensure proper and seamless bring up
  • Working continuously with vendors to improve thermal interface materials and implement enhancements to burn-in board design and systems
  • Implementing and improving temperature sense methods for device temperature monitoring and feedback control
  • Collaborating with multi-functional teams to debug and resolve any hardware/software product issues
  • Maintaining and enhancing the reliability database and documentation
Requirements:
  • Strong expertise in HTOL stress testing and JEDEC standards
  • Knowledge of other reliability environmental stress tests like Temperature Cycling (TC), Reflow, Thermal Shock, HAST
  • Experience working with MCC HTOL ovens, including operation/debug/maintenance of the oven and hands-on repair experience with electrical and mechanical problems
  • Experience with dual or multi-die packages, related designs, and effective thermal control
  • Expected to use oscilloscopes and current probes to analyze device current behavior while running different vectors
  • Capable of performing preventive and corrective maintenance on HTOL ovens, refrigeration systems, and basic soldering tasks
  • Recommended to have knowledge of thermal management techniques for HTOL environments and toggle coverage associated with running different vectors
  • Skilled in vector debugging, developing, and modifying test scripts
  • Excellent communication and collaboration skills
  • Self-motivated, proactive, and able to work independently
  • Master's or bachelor's degree in electrical engineering, Mechanical Engineering, or a related field (or equivalent experience)
  • Minimum 5 years of experience in HTOL test system operation and data analysis for semiconductor devices
What We Offer:

NVIDIA offers a highly competitive salary range of $108,000 - $172,500, as well as a comprehensive benefits package. We are an equal opportunity employer and value diversity in our current and future employees.



  • Santa Clara, California, United States Nvidia Full time

    Senior Reliability EngineerNVIDIA is seeking a highly skilled Senior Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for planning and implementing the qualifications of new NVIDIA products, including IC chips in AI, Mobile, Automotive, Deep Learning, Graphic Processor, and System on Chip sectors.Key...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Cloud Reliability EngineerWe are seeking a highly motivated Senior Cloud Reliability Engineer to join our Embedded organization.This team is responsible for automating, deploying, and maintaining infrastructure for various NVIDIA AI workflows and applications such as Metropolis, ACE, and Riva hosted in the cloud.The Senior Cloud Reliability...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Site Reliability EngineerNVIDIA is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our engineering organization, you will be responsible for designing, implementing, and supporting operational and reliability aspects of large scale Kubernetes clusters.Key Responsibilities:Design and implement...


  • Santa Clara, California, United States Anello Photonics Full time

    About Anello Photonics:Anello Photonics is a leading-edge technology company based in Santa Clara, CA. The company has developed integrated photonic system-on-chip technology for next-generation navigation. ANELLO's SIPHOGTM gyroscope is based on its patented photonic integrated circuit technology. The result is a product that is higher performance, much...


  • Santa Clara, California, United States NVIDIA Full time

    About the RoleNVIDIA is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our engineering organization, you will be responsible for designing, implementing, and supporting operational and reliability aspects of our large-scale Observability & Telemetry collection platform.You will engage in the entire lifecycle of...


  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in the field of artificial intelligence, machine learning, and datacenter acceleration. Our company has a rich history of innovation, with a legacy that dates back to the invention of the GPU in 1999. This groundbreaking technology sparked the growth of the PC gaming market, redefined modern computer graphics, and...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job OverviewPalo Alto Networks is seeking a highly skilled Cloud Infrastructure Engineer to join our CDL/SLS team. As a Senior Staff Site Reliability Engineer, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Our team is at the forefront of innovation, constantly pushing the boundaries of what is...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Senior Staff Site Reliability Engineer to join our team at Palo Alto Networks. As a key member of our Cloud Infrastructure team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Our ideal candidate will have a strong background in cloud computing, with...


  • Santa Clara, California, United States NVIDIA Full time

    At NVIDIA, we're seeking a highly skilled Senior Cloud Reliability Engineer to join our team. As a key member of our Site Reliability Engineering (SRE) team, you'll be responsible for designing, building, and maintaining large-scale production systems with high efficiency and availability.This is a highly specialized discipline that demands knowledge across...


  • Santa Clara, California, United States NVIDIA Full time

    Reliability Engineer Job DescriptionNVIDIA is a leader in the field of computer graphics, PC gaming, and accelerated computing. We are seeking a highly skilled Reliability Engineer to join our team.Key Responsibilities:Develop, debug, and manage test programs for the HTOL oven.Review and design HTOL board schematics for various ovens.Diagnose signal...


  • Santa Clara, California, United States NVIDIA Full time

    Reliability EngineerNVIDIA is a leader in the field of artificial intelligence and high-performance computing. We are seeking a highly skilled Reliability Engineer to join our team.The successful candidate will be responsible for providing expertise in hardware reliability engineering for electronics and server systems. This will involve establishing and...


  • Santa Clara, California, United States NVIDIA Full time

    Job DescriptionNVIDIA is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will be responsible for designing, implementing, and supporting operational and reliability aspects of large scale Kubernetes clusters.Key Responsibilities:Design and implement operational and reliability aspects of large...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About UsPalo Alto Networks is a leader in the cybersecurity industry, dedicated to protecting the digital way of life. Our mission is to be the cybersecurity partner of choice, and we're looking for innovators who share our passion for shaping the future of cybersecurity.We're a company built on disruption, and we're looking for individuals who are...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team. As a key member of our infrastructure team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key Responsibilities:Develop expertise in new technologies and contribute to the...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key ResponsibilitiesContribute to the success of SRE and DevOps teamsDevelop expertise in new...


  • Santa Clara, California, United States Ushur Full time

    About UshurUshur is a leading provider of Customer Experience Automation solutions, empowering enterprises to deliver delightful customer and employee experiences. Our cutting-edge technologies, including Conversational AI, Machine Learning, and Intelligent Process Automation, enable Fortune 100 companies to automate their customer engagement.The RoleWe are...


  • Santa Clara, California, United States NVIDIA Full time

    As a Senior Manager in Site Reliability Engineering (SRE) at NVIDIA, you will lead a team dedicated to the design, construction, and maintenance of expansive production systems, emphasizing high efficiency and availability. This role spans various domains, including software and systems engineering, cloud-scale storage, data management, and services. SRE...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team at Palo Alto Networks. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps,...

  • Reliability Engineer

    4 weeks ago


    Santa Clara, California, United States Omni Vision Inc Full time

    Job Title: Sr. Reliability EngineerOmni Vision Inc is seeking a highly skilled Sr. Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the quality and reliability of our CMOS Image Sensor products.Key Responsibilities:Review reliability qualification testing results and determine whether our...

  • Reliability Engineer

    1 month ago


    Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly motivated and experienced Reliability Engineer to join our team. As a key member of our Hardware Quality and Compliance Engineering team, you will play a critical role in ensuring the quality and reliability of our new products from inception through the first year in production.Key...