Senior Software Developer, HPC Cluster Management Specialist

4 weeks ago


Santa Clara, California, United States NVIDIA Full time
Unlock the Power of HPC Cluster Management

NVIDIA is at the forefront of transforming computer graphics, PC gaming, and accelerated computing. We're now pushing the boundaries of AI to define the next era of computing.

As a Senior Software Developer, you'll be part of a diverse and supportive environment where everyone is inspired to do their best work. You'll be working on our Linux-based cluster management software environment, developing the head node and compute node installation and provisioning processes.

Key Responsibilities:

  • Development of the head node and compute node installation and provisioning processes
  • Work on functionality in the area of edge site deployment
  • Integrating our product with the latest hardware (e.g GPUs, DPUs, accelerators, high-speed interconnects such as Infiniband)
  • Work on features related to composable infrastructure management
  • Develop new features for our BIOS and firmware upgrade management
  • Develop functionality that makes Bright clusters usable for a wider range of workloads, and increases scalability to allow clusters to scale to huge number of nodes
  • Adding support for new Linux distributions
  • Improving support for alternative CPU architectures such as ARM
  • Work on adding features to our Ansible collections for Cluster Installation and Management

Requirements:

  • Degree in Computer Science or related field (or equivalent experience)
  • 7+ years of experience in software development and/or related roles
  • Our software is based on Linux. You should be very familiar with the Linux operating system and in particular with networking concepts in Linux
  • Good practical knowledge about the most common software that is installed as part of a typical Linux installation
  • Proficient in Python and intimately familiar with object oriented software design, design patterns, and concurrent programming techniques
  • Emphasis on high quality of work and in producing clean code
  • Eager to learn and use new technologies

What We Offer:

  • Competitive base salary range: $180,000 - $339,250 USD
  • Eligibility for equity and benefits
  • NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer


  • Santa Clara, California, United States Nvidia Full time

    Job SummaryNVIDIA is seeking a highly skilled Senior HPC Cluster Administrator to lead our GPU Compute Cluster team. As a key member of our Deep Learning Frameworks Group, you will be responsible for designing and implementing cutting-edge GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a leader in the field of high-performance computing, and we are seeking a skilled Senior Software Engineer to join our team.The ideal candidate will have a strong background in software development, with experience in designing and creating reliable distributed systems. They will also have the ability to implement well-thought-out long-term...


  • Santa Clara, California, United States NVIDIA Full time

    Technical Support Specialist for Linux and HPC InfrastructureNVIDIA is a leader in computer graphics, PC gaming, and accelerated computing. We're seeking a Technical Support Specialist to join our team and provide expert support for our Linux-based cluster management software product.Key Responsibilities:Provide technical support to internal and external...


  • Santa Clara, California, United States HPE Full time

    Job Description:Hewlett Packard Enterprise is seeking a highly skilled Software Engineer to join our HPC and AI organization. As a key member of the Slingshot Ethernet Fabric team, you will play a critical role in expanding HPE's High Performance Ethernet Fabric product growth through Commercial HPC use cases, AI use cases networking, systems, and...


  • Santa Clara, California, United States NVIDIA Full time

    Job Description:NVIDIA is the world leader in computer graphics, artificial intelligence, and accelerated computing. For over 25 years, we have been at the forefront of research and engineering around the greatest advances in technology. Our history of innovation drives us to solve the world's hardest problems.We are looking for a Senior HPC and AI Solutions...


  • Santa Clara, California, United States HPE Full time

    About the Role:Hewlett Packard Enterprise (HPE) is seeking an experienced Software Engineer to join the Slingshot Ecosystem Development Team. This role will focus on expanding HPE's High Performance Ethernet Fabric product growth through Commercial HPC use cases, AI use cases networking, systems, and application and open-source...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA's Deep Learning Optimized Frameworks Group is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural guidance to product teams in the deep learning and scientific computing domains.As a member of the DLFW Infrastructure team, you will provide leadership in the design and...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute...


  • Santa Clara, California, United States NVIDIA Full time

    Technical Support Engineer, Linux and HPCNVIDIA is a leader in computer graphics, PC gaming, and accelerated computing. We're committed to innovation and excellence in our products and services.This role is a key part of our customer support team, providing technical assistance to customers running our Linux-based cluster management software.Key...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Product Architect, HPC and AIJob Summary: We are seeking a visionary Product Architect to join our team at NVIDIA. As a key member of our team, you will harness your infrastructure expertise to create reference designs for the world's most powerful AI clusters.Responsibilities:* Design the next-gen datacenter-scale AI infrastructure,...


  • Santa Clara, California, United States NVIDIA Full time

    A key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join our multifaceted software team with high standards.This role involves...


  • Santa Clara, California, United States NVIDIA Full time

    A key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join a multifaceted software team with high standards.This role involves...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a leader in the field of computer graphics, PC gaming, and accelerated computing. We're now leveraging the power of AI to drive the next era of computing.As a Senior Software Developer, you'll be part of a diverse and supportive team that's passionate about innovation and excellence.Our cluster management software is built on Linux, and we're...


  • Santa Clara, California, United States NVIDIA Full time

    Job SummaryNVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization. As a GPU Communications Expert, you will be part of the GPU Communications Libraries and Networking team, delivering communication runtimes like NCCL and NVSHMEM for Deep Learning and HPC applications. We are looking for a...


  • Santa Clara, California, United States Amazon Full time

    Job DescriptionWe are seeking a highly skilled Sr. Worldwide Specialist Solutions Architect to join our team at Amazon Web Services (AWS). As a key member of our sales organization, you will work with customers to design and implement cloud-based solutions for High Performance Computing (HPC) workloads.Key Responsibilities:Design and architect HPC solutions...


  • Santa Clara, California, United States NVIDIA Full time

    About the Role:NVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.Our work opens up new universes to explore, enables groundbreaking creativity and discovery, and...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.We are the GPU Communications Libraries and...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization.The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.We are the GPU Communications Libraries and Networking team at NVIDIA. We deliver communication runtimes like NCCL...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a leader in the field of high-performance computing, and we are seeking a skilled Senior Software Engineer to join our team.The ideal candidate will have a strong background in software development, with experience in designing and implementing reliable distributed systems. They will also have a solid understanding of scalability challenges and...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is seeking a senior build and continuous integration (CI/CD) engineer for its GenAI Frameworks (NeMo, Megatron Core) team.NVIDIA NeMo is an open-source, scalable, and cloud-native framework built for researchers and developers working on Large Language Models (LLM), Multimodal (MM), and Speech AI.NeMo provides end-to-end model training, including data...