Senior HPC Systems Engineer

8 hours ago


Santa Clara, California, United States NVIDIA Full time
About NVIDIA

NVIDIA has been a pioneer in computer graphics, PC gaming, and accelerated computing for over 25 years. Our legacy of innovation is fueled by great technology and amazing people. Today, we're pushing the boundaries of AI to define the next era of computing.

Job Summary

We're seeking an exceptional Senior HPC Systems Engineer to join our team. As a key player in our most exciting computing hardware and software, you'll contribute to the latest breakthroughs in artificial intelligence and GPU computing. Your expertise will help us craft improved workflows and develop new, leading differentiated solutions.

Key Responsibilities
  • Lead all aspects of implementing performance practices in large-scale infrastructure, delivering powerful tools, methodologies, and flows to validate and improve several datacenter products in parallel.
  • Accelerate strategic customer deployments and ensure speed-of-light bringup and deployment of ground-breaking AI infrastructure by working hand-in-hand tailoring design and faster processes to customer needs.
  • Provide engineering solutions to enable large-scale performance strategies for performance for Datacenter GPU Computing products and software stacks, ensuring technical relationships with internal and external engineering teams, and assisting systems engineers in building creative solutions based on NVIDIA technology.
  • Participate in engagements with various SW and FW (BMC/SBIOS/OS/drivers etc) teams to develop best-in-class practices and tools, analyzing, debugging, and resolving critical software issues for the best AI workload performance at scale.
  • Own the architecting of performance design and settings of datacenter at scale products both implemented in FW and SW components to ensure velocity and scale while efficiently using resources.
Requirements
  • 5+ years of experience in using accelerated computing for datacenter container computing solutions.
  • BS in Engineering, Mathematics, Physics, or Computer Science, MS or PhD desirable (or equivalent experience).
  • Solid understanding of accelerated parallel computing models (MPI, NCCL).
  • Experience using and handling modern Cloud and container-based Enterprise computing architectures.
  • C/C++/Python/Bash programming/scripting experience.
  • Experience with CPU architecture.
  • Experience with container technology and Linux-based OSes.
  • Experience working with engineering or academic research community supporting high-performance computing or deep learning.
  • Strong verbal and written communication skills as well as excellent teamwork and communication skills.
  • Ability to multitask effectively in a dynamic environment.
What We Offer

NVIDIA offers highly competitive salaries and a comprehensive benefits package. As one of the technology world's most desirable employers, we're committed to fostering a diverse work environment and proud to be an equal opportunity employer.



  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in the field of computer graphics, PC gaming, and accelerated computing. With a legacy of innovation spanning over 25 years, we're committed to pushing the boundaries of what's possible with AI and GPU computing.Job SummaryWe're seeking an exceptional Senior HPC Systems Engineer to join our team. As a key player in our AI...


  • Santa Clara, California, United States NVIDIA Full time

    Senior Software Engineer - HPC Infrastructure SpecialistNVIDIA is a pioneer in the field of high-performance computing, and we're seeking a talented Senior Software Engineer to join our team. As a key member of our HPC infrastructure team, you will be responsible for designing and implementing scalable systems to meet the demands of our high-performance...


  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in the field of artificial intelligence, machine learning, and datacenter acceleration. With a rich history of innovation, we have continuously pushed the boundaries of what is possible in the world of computing.Job SummaryWe are seeking an experienced Site Reliability Engineer to join our GPU AI/HPC Infrastructure team. As a...

  • HPC Cluster Engineer

    45 seconds ago


    Santa Clara, California, United States Sustainable Talent Full time

    Unlock the Power of HPCSustainable Talent is seeking a seasoned HPC Cluster Engineer to join our team in shaping the future of AI, deep learning, and machine learning initiatives. As a key player in our Nvidia-powered HPC environment, you'll leverage cutting-edge GPU technology to drive groundbreaking discoveries and revolutionize industries.As a trusted...


  • Santa Clara, California, United States Nvidia Full time

    NVIDIA, a prominent player in the realms of Artificial Intelligence, High-Performance Computing, and Visualization, is on the lookout for a Lead Site Reliability Engineer specializing in HPC storage systems. This role involves collaborating with our team to architect, implement, and enhance on-premises HPC storage solutions while integrating cloud...


  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. Our products and services rely heavily on NVIDIA GPUs, which serve as the visual cortex of modern computers. Our work enables new universes to explore, facilitates amazing creativity and discovery, and powers innovations...


  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. Our products and services rely heavily on NVIDIA GPUs, which serve as the visual cortex of modern computers. Our work enables new universes to explore, facilitates amazing creativity and discovery, and powers innovations...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Site Reliability EngineerNVIDIA is a leader in AI, machine learning, and datacenter acceleration. Our company is expanding its leadership into datacenter networking with ethernet switches, NICs, and DPUs. We have continuously reinvented ourselves over two decades.Our invention of the GPU in 1999 sparked the growth of the PC gaming market,...


  • Santa Clara, California, United States Skilltorch Full time

    Job OverviewSkilltorch is seeking a Senior Director of Solutions Engineering to lead our team of technical experts in delivering innovative solutions for AI and high-performance computing (HPC) applications. As a key member of our leadership team, you will be responsible for shaping the development and deployment of enterprise solutions that meet complex...


  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. Our GPUs serve as the visual cortex of modern computers and are at the heart of our products and services.We are pushing the boundaries of innovation, enabling amazing creativity and discovery, and powering innovations such...


  • Santa Clara, California, United States NVIDIA Full time

    Senior Site Reliability EngineerNVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. Our work opens up new universes to explore, enables unique creativity and discovery, and powers what were once science fiction inventions, from artificial intelligence to autonomous cars.We are seeking a...


  • Santa Clara, California, United States NVIDIA Full time

    Senior Site Reliability EngineerNVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. Our work opens up new universes to explore, enables unique creativity and discovery, and powers what were once science fiction inventions, from artificial intelligence to autonomous cars.We are looking...


  • Santa Clara, California, United States Qualcomm Full time

    Job Title: Senior Systems EngineerWe are seeking a highly skilled Senior Systems Engineer to join our team at Qualcomm. As a Senior Systems Engineer, you will be responsible for designing and implementing advanced signal-processing algorithms for Wireless LAN (WLAN/Wi-Fi) communications systems.Key Responsibilities:Apply systems knowledge and experience to...


  • Santa Clara, California, United States Qualcomm Full time

    Job Title: Senior Systems EngineerWe are seeking a highly skilled Senior Systems Engineer to join our team at Qualcomm. As a Senior Systems Engineer, you will be responsible for designing and implementing advanced signal-processing algorithms for Wireless LAN (WLAN/Wi-Fi) communications systems.Key Responsibilities:Apply systems knowledge and experience to...


  • Santa Clara, California, United States Oracle Full time

    Job DescriptionJob Summary: We are seeking a highly skilled and experienced Senior Principal Software Engineer to join our Cloud Engineering Infrastructure Development team at Oracle. As a key member of our team, you will be responsible for designing, developing, and performance tuning the networking stack required to run distributed AI/ML/HPC workloads...

  • Solutions Architect

    3 weeks ago


    Santa Clara, California, United States NVIDIA Corporation Full time

    Solutions Architect - AI and HPC Cloud ExpertNVIDIA Corporation is seeking a highly skilled Solutions Architect to join its Cloud Infrastructure Team. As a key member of the team, you will be responsible for designing and implementing sophisticated cloud solutions that cater to the infrastructure needs of various NVIDIA groups, including Graphics Processors,...


  • Santa Clara, California, United States Oracle Full time

    Cloud Engineering Infrastructure DevelopmentOracle Cloud Infrastructure (OCI) Cluster Networking team is building an ultra-high performance network required to support AI/ML/HPC workloads. This is an exciting opportunity to join the AI revolution and design systems that allow customers to scale from tens to thousands of GPUs without compromising on...


  • Santa Clara, California, United States Oracle Full time

    Cloud Engineering Infrastructure DevelopmentOracle Cloud Infrastructure (OCI) Cluster Networking team is building an ultra-high performance network required to support AI/ML/HPC workloads. This is an exciting opportunity to join the AI revolution and design systems that allow customers to scale from tens to thousands of GPUs without compromising on...


  • Santa Clara, California, United States Qualcomm Full time

    Job Title: Senior Systems EngineerQualcomm is a leading provider of wireless and wired technologies for the mobile, networking, computing, and consumer electronics markets. We are focused on inventing technologies that connect and empower people in ways that are elegant and accessible to all.Job Summary:We are seeking a highly skilled Senior Systems Engineer...


  • Santa Clara, California, United States Apollo Professional Solutions Full time

    Job Title: Senior Systems EngineerApollo Professional Solutions is seeking a highly skilled Senior Systems Engineer to join our team. As a key member of our engineering team, you will be responsible for improving the reliability and robustness of our next generation sequencing and sample prep platforms.Key Responsibilities:Conduct failure analysis and root...