Current jobs related to Senior High Performance Computing Cluster Administrator - Santa Clara - NVIDIA
-
Santa Clara, United States NVIDIA Full timeNVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design...
-
Santa Clara, California, United States NVIDIA Full timeNVIDIA Deep Learning Infrastructure TeamWe are seeking a highly skilled and experienced HPC cluster administrator to lead our diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.Key Responsibilities:Design and implement groundbreaking GPU compute clusters that...
-
Santa Clara, United States NVIDIA Full timeNVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design...
-
Santa Clara, California, United States NVIDIA Full timeJob Title: Senior High Performance Computing Cluster AdministratorNVIDIA's Deep Learning Optimized Frameworks Group is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.Key...
-
US, CA, Santa Clara NVIDIA Full timeNVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design...
-
Santa Clara, California, United States NVIDIA Full timeNVIDIA's Deep Learning Optimized Frameworks Group is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural guidance to product teams in the deep learning and scientific computing domains.As a member of the DLFW Infrastructure team, you will provide leadership in the design and...
-
High Performance Computing Cluster Architect
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeNVIDIA is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute...
-
High Performance Computing Systems Administrator
3 weeks ago
Santa Clara, California, United States Santa Clara University Full timeJob Title: High Performance Computing Systems AdministratorJob Summary:We are seeking a highly skilled High Performance Computing (HPC) Systems Administrator to join our team at Santa Clara University. The successful candidate will be responsible for the administration, maintenance, and optimization of our HPC systems, ensuring the smooth operation of our...
-
Senior HPC Cluster Administrator
3 weeks ago
Santa Clara, California, United States Nvidia Full timeJob SummaryNVIDIA is seeking a highly skilled Senior HPC Cluster Administrator to lead our GPU Compute Cluster team. As a key member of our Deep Learning Frameworks Group, you will be responsible for designing and implementing cutting-edge GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive...
-
High Performance Computing Systems Administrator
4 weeks ago
Santa Clara, California, United States Santa Clara University Full timeJob Title: High Performance Computing Systems AdministratorJob Summary:Santa Clara University is seeking a highly skilled High Performance Computing Systems Administrator to join our dynamic team. As a key member of our IT department, you will be responsible for the administration, maintenance, and optimization of our HPC systems, ensuring the smooth...
-
High Performance Computing Systems Administrator
4 weeks ago
Santa Clara, California, United States Santa Clara University Full timeJob Title: High Performance Computing Systems AdministratorJob Summary: We are seeking a highly skilled High Performance Computing Systems Administrator to join our team at Santa Clara University. The successful candidate will be responsible for the administration, maintenance, and optimization of our HPC systems, ensuring the smooth operation of our...
-
GPU Cluster Performance Engineer
2 weeks ago
Santa Clara, United States Advanced Micro Devices , Inc. Full timeOverview: WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences the building blocks for the data center, artificial intelligence, PCs, gaming and embedded....
-
GPU Cluster Performance Engineer
4 months ago
Santa Clara, United States Advanced Micro Devices , Inc. Full timeOverview: WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences the building blocks for the data center, artificial intelligence, PCs, gaming and embedded....
-
Senior Software Architect
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeNVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.We are the GPU Communications Libraries and...
-
Santa Clara, California, United States NVIDIA Full timeUnlock the Power of HPC Cluster ManagementNVIDIA is at the forefront of transforming computer graphics, PC gaming, and accelerated computing. We're now pushing the boundaries of AI to define the next era of computing.As a Senior Software Developer, you'll be part of a diverse and supportive environment where everyone is inspired to do their best work. You'll...
-
Senior Software Engineer
5 days ago
Santa Clara, United States NVIDIA Full timeNVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were...
-
Senior GPU Cluster Tools Developer
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeA key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join a multifaceted software team with high standards.This role involves...
-
Senior GPU Cluster Tools Developer
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeA key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join a multifaceted software team with high standards.This role involves...
-
Senior GPU Cluster Tools Developer
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeA key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join our multifaceted software team with high standards.This role involves...
-
Senior Site Reliability Engineer
3 weeks ago
Santa Clara, United States NVIDIA Full timeNVIDIA is the leader in AI, machine learning and datacenter acceleration. NVIDIA is expanding that leadership into datacenter networking with ethernet switches, NICs and DPUs NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and...
Senior High Performance Computing Cluster Administrator
2 months ago
NVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance computing, and computationally intensive workloads. We are looking for an expert to identify architectural changes and/or completely innovative approaches for our GPU Compute Cluster. In this role, you will help us with the strategic challenges we encounter, including compute, networking, and storage design for large-scale, high-performance workloads and effective resource utilization in a heterogeneous compute environment.
What you'll be doing:
Administer Linux systems, ranging from powerful DGX servers to embedded systems, bringup hardware to publicly available systems.
Coordinate Storage Solutions and plan for growth.
Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.)
Actively connect with management regarding any problems with the equipment and propose resolution.
Plan, build and install/upgrade new systems that support NVIDIA DL Software
What we need to see:
You have a BA, BS, or MS in CS, EE, CE or equivalent experience
4+ years of previous experience deploying and administrating HPC clusters
Familiar with resource scheduling managers (Slurm (preferred), LSF, etc
Proven track record to script in bash, Perl or python
Experience with containers (Docker, Singularity, LXC)
Deep understanding of operating systems, computer networks, and high-performance applications
Ability to work well with developers & test engineers
Hard-working dedication to provide quality in support for your users
Ways to stand out from the crowd:
Familiarity and prior work experience with technologies such as: Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker
Familiarity with GPU usage in Compute Cluster and Cuda
Experience with mobile and embedded systems
Basic knowledge of Deep Learning.
Experience coding/scripting in Perl/Python/bash
You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.