Senior Software Developer, HPC Cluster Management Specialist
4 weeks ago
NVIDIA is at the forefront of transforming computer graphics, PC gaming, and accelerated computing. We're now pushing the boundaries of AI to define the next era of computing.
As a Senior Software Developer, you'll be part of a diverse and supportive environment where everyone is inspired to do their best work. You'll be working on our Linux-based cluster management software environment, developing the head node and compute node installation and provisioning processes.
Key Responsibilities:
- Development of the head node and compute node installation and provisioning processes
- Work on functionality in the area of edge site deployment
- Integrating our product with the latest hardware (e.g GPUs, DPUs, accelerators, high-speed interconnects such as Infiniband)
- Work on features related to composable infrastructure management
- Develop new features for our BIOS and firmware upgrade management
- Develop functionality that makes Bright clusters usable for a wider range of workloads, and increases scalability to allow clusters to scale to huge number of nodes
- Adding support for new Linux distributions
- Improving support for alternative CPU architectures such as ARM
- Work on adding features to our Ansible collections for Cluster Installation and Management
Requirements:
- Degree in Computer Science or related field (or equivalent experience)
- 7+ years of experience in software development and/or related roles
- Our software is based on Linux. You should be very familiar with the Linux operating system and in particular with networking concepts in Linux
- Good practical knowledge about the most common software that is installed as part of a typical Linux installation
- Proficient in Python and intimately familiar with object oriented software design, design patterns, and concurrent programming techniques
- Emphasis on high quality of work and in producing clean code
- Eager to learn and use new technologies
What We Offer:
- Competitive base salary range: $180,000 - $339,250 USD
- Eligibility for equity and benefits
- NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer
-
Senior HPC Cluster Administrator
4 weeks ago
Santa Clara, California, United States Nvidia Full timeJob SummaryNVIDIA is seeking a highly skilled Senior HPC Cluster Administrator to lead our GPU Compute Cluster team. As a key member of our Deep Learning Frameworks Group, you will be responsible for designing and implementing cutting-edge GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive...
-
Senior Software Engineer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeNVIDIA is a leader in the field of high-performance computing, and we are seeking a skilled Senior Software Engineer to join our team.The ideal candidate will have a strong background in software development, with experience in designing and creating reliable distributed systems. They will also have the ability to implement well-thought-out long-term...
-
Santa Clara, California, United States NVIDIA Full timeTechnical Support Specialist for Linux and HPC InfrastructureNVIDIA is a leader in computer graphics, PC gaming, and accelerated computing. We're seeking a Technical Support Specialist to join our team and provide expert support for our Linux-based cluster management software product.Key Responsibilities:Provide technical support to internal and external...
-
HPC/AI Software Engineer
4 weeks ago
Santa Clara, California, United States HPE Full timeJob Description:Hewlett Packard Enterprise is seeking a highly skilled Software Engineer to join our HPC and AI organization. As a key member of the Slingshot Ethernet Fabric team, you will play a critical role in expanding HPE's High Performance Ethernet Fabric product growth through Commercial HPC use cases, AI use cases networking, systems, and...
-
Senior HPC and AI Solutions Architect
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob Description:NVIDIA is the world leader in computer graphics, artificial intelligence, and accelerated computing. For over 25 years, we have been at the forefront of research and engineering around the greatest advances in technology. Our history of innovation drives us to solve the world's hardest problems.We are looking for a Senior HPC and AI Solutions...
-
HPC/AI Software Engineer
4 weeks ago
Santa Clara, California, United States HPE Full timeAbout the Role:Hewlett Packard Enterprise (HPE) is seeking an experienced Software Engineer to join the Slingshot Ecosystem Development Team. This role will focus on expanding HPE's High Performance Ethernet Fabric product growth through Commercial HPC use cases, AI use cases networking, systems, and application and open-source...
-
Santa Clara, California, United States NVIDIA Full timeNVIDIA's Deep Learning Optimized Frameworks Group is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural guidance to product teams in the deep learning and scientific computing domains.As a member of the DLFW Infrastructure team, you will provide leadership in the design and...
-
High Performance Computing Cluster Architect
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeNVIDIA is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute...
-
Santa Clara, California, United States NVIDIA Full timeTechnical Support Engineer, Linux and HPCNVIDIA is a leader in computer graphics, PC gaming, and accelerated computing. We're committed to innovation and excellence in our products and services.This role is a key part of our customer support team, providing technical assistance to customers running our Linux-based cluster management software.Key...
-
Senior Product Architect, HPC and AI
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob Title: Senior Product Architect, HPC and AIJob Summary: We are seeking a visionary Product Architect to join our team at NVIDIA. As a key member of our team, you will harness your infrastructure expertise to create reference designs for the world's most powerful AI clusters.Responsibilities:* Design the next-gen datacenter-scale AI infrastructure,...
-
Senior GPU Cluster Tools Developer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeA key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join our multifaceted software team with high standards.This role involves...
-
Senior GPU Cluster Tools Developer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeA key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join a multifaceted software team with high standards.This role involves...
-
Senior Software Developer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeNVIDIA is a leader in the field of computer graphics, PC gaming, and accelerated computing. We're now leveraging the power of AI to drive the next era of computing.As a Senior Software Developer, you'll be part of a diverse and supportive team that's passionate about innovation and excellence.Our cluster management software is built on Linux, and we're...
-
Senior System Software Engineer, NCCL
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob SummaryNVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization. As a GPU Communications Expert, you will be part of the GPU Communications Libraries and Networking team, delivering communication runtimes like NCCL and NVSHMEM for Deep Learning and HPC applications. We are looking for a...
-
Senior HPC Solutions Architect
4 weeks ago
Santa Clara, California, United States Amazon Full timeJob DescriptionWe are seeking a highly skilled Sr. Worldwide Specialist Solutions Architect to join our team at Amazon Web Services (AWS). As a key member of our sales organization, you will work with customers to design and implement cloud-based solutions for High Performance Computing (HPC) workloads.Key Responsibilities:Design and architect HPC solutions...
-
Senior Software Architect
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeAbout the Role:NVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.Our work opens up new universes to explore, enables groundbreaking creativity and discovery, and...
-
Senior System Software Engineer, NCCL
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeNVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.We are the GPU Communications Libraries and...
-
Senior System Software Engineer, NCCL
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeNVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization.The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.We are the GPU Communications Libraries and Networking team at NVIDIA. We deliver communication runtimes like NCCL...
-
Senior Software Architect
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeNVIDIA is a leader in the field of high-performance computing, and we are seeking a skilled Senior Software Engineer to join our team.The ideal candidate will have a strong background in software development, with experience in designing and implementing reliable distributed systems. They will also have a solid understanding of scalability challenges and...
-
Senior MLOps Engineer, GenAI Framework
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeNVIDIA is seeking a senior build and continuous integration (CI/CD) engineer for its GenAI Frameworks (NeMo, Megatron Core) team.NVIDIA NeMo is an open-source, scalable, and cloud-native framework built for researchers and developers working on Large Language Models (LLM), Multimodal (MM), and Speech AI.NeMo provides end-to-end model training, including data...