Senior High Performance Computing Cluster Administrator
5 days ago
NVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance computing, and computationally intensive workloads. We are looking for an expert to identify architectural changes and/or completely innovative approaches for our GPU Compute Cluster. In this role, you will help us with the strategic challenges we encounter, including compute, networking, and storage design for large-scale, high-performance workloads and effective resource utilization in a heterogeneous compute environment.
What you'll be doing:
Administer Linux systems, ranging from powerful DGX servers to embedded systems, bringup hardware to publicly available systems.
Coordinate Storage Solutions and plan for growth.
Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.)
Actively connect with management regarding any problems with the equipment and propose resolution.
Plan, build and install/upgrade new systems that support NVIDIA DL Software
What we need to see:
You have a BA, BS, or MS in CS, EE, CE or equivalent experience
4+ years of previous experience deploying and administrating HPC clusters
Familiar with resource scheduling managers (Slurm (preferred), LSF, etc
Proven track record to script in bash, Perl or python
Experience with containers (Docker, Singularity, LXC)
Deep understanding of operating systems, computer networks, and high-performance applications
Ability to work well with developers & test engineers
Hard-working dedication to provide quality in support for your users
Ways to stand out from the crowd:
Familiarity and prior work experience with technologies such as: Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker
Familiarity with GPU usage in Compute Cluster and Cuda
Experience with mobile and embedded systems
Basic knowledge of Deep Learning.
Experience coding/scripting in Perl/Python/bash
You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.-
Santa Clara, California, United States NVIDIA Full timeJob DescriptionNVIDIA is seeking a highly skilled Senior High Performance Computing Cluster Administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.Key ResponsibilitiesAdminister Linux systems, ranging from powerful DGX servers to embedded...
-
Santa Clara, United States NVIDIA Full timeNVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design...
-
GPU Cluster Performance Engineer
3 months ago
Santa Clara, United States Advanced Micro Devices , Inc. Full timeOverview: WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences the building blocks for the data center, artificial intelligence, PCs, gaming and embedded....
-
GPU Cluster Performance Optimization Specialist
7 hours ago
Santa Clara, California, United States Advanced Micro Devices , Inc. Full timeGPU Cluster Performance EngineerWe are seeking a highly motivated and skilled GPU Cluster Performance Engineer to join our dynamic team at Advanced Micro Devices, Inc.In this role, you will be at the forefront of optimizing and achieving peak performance for GPU clusters. The ideal candidate will have a strong background in GPU architectures, parallel...
-
Santa Clara, California, United States Advanced Micro Devices , Inc. Full timeGPU Cluster Performance EngineerAt Advanced Micro Devices, Inc., we're pushing the boundaries of innovation to solve the world's most complex challenges. We're seeking a highly skilled GPU Cluster Performance Engineer to join our dynamic team.Key Responsibilities:Performance Optimization: Collaborate with hardware and software teams to enhance the overall...
-
High-Performance Computing Architect
2 weeks ago
Santa Clara, California, United States Tenstorrent Inc Full timeJob Description**About the Role**Tenstorrent Inc is seeking a highly skilled and experienced Senior Principal High-Performance Computing Architect to lead the design and implementation of cutting-edge architectures for high-performance computing systems. As a key member of our team, you will play a crucial role in enabling efficient and scalable computation...
-
HPC Cluster Engineer
3 months ago
Santa Clara, United States Sustainable Talent Full timeJob DescriptionJob DescriptionAre you ready to make your mark in the forefront of technological innovation? As an HPC Cluster Engineer, you'll play a pivotal role in shaping the future of AI, deep learning, and machine learning initiatives. Join us and leverage Nvidia's cutting-edge GPU technology to drive groundbreaking discoveries and revolutionize...
-
HPC Cluster Engineer
4 months ago
Santa Clara, United States Sustainable Talent Full timeJob DescriptionJob DescriptionAre you ready to make your mark in the forefront of technological innovation? As an HPC Cluster Engineer, you'll play a pivotal role in shaping the future of AI, deep learning, and machine learning initiatives. Join us and leverage Nvidia's cutting-edge GPU technology to drive groundbreaking discoveries and revolutionize...
-
High-Performance Computing Architect
5 days ago
Santa Clara, California, United States Tenstorrent Inc Full timeHigh-Performance Computing ArchitectTenstorrent Inc is seeking a skilled High-Performance Computing (HPC) Architect to design and implement cutting-edge architectures for high-performance computing systems. As an HPC Architect, you will play a crucial role in enabling efficient and scalable computation for scientific, research, and data-intensive...
-
High-Performance Computing Architect
7 days ago
Santa Clara, California, United States Tenstorrent Inc Full timeAbout the RoleTenstorrent Inc is seeking a highly skilled and experienced High-Performance Computing (HPC) Architect to lead the design and implementation of cutting-edge HPC systems. As an HPC Architect, you will play a crucial role in delivering optimized solutions that meet the demanding requirements of HPC workloads.Key ResponsibilitiesDesign and Develop...
-
Senior CPU Architect
3 weeks ago
Santa Clara, California, United States Sunlune Full timeJob Description**Role:** CPU Architecture Engineer, Full-time**About the Role:** We are seeking a highly skilled CPU Architecture Engineer to join our team at Sunlune. As a key member of our engineering team, you will be responsible for designing and optimizing high-performance CPU architectures for AI applications.**Responsibilities:**Design and optimize...
-
Senior Performance Optimization Engineer
9 hours ago
Santa Clara, California, United States NVIDIA Full timeJob Title: Senior Performance Optimization EngineerWe are seeking a highly skilled Senior Performance Optimization Engineer to join our AI Applications organization at NVIDIA. As a key member of our team, you will be responsible for optimizing the performance of our distributed cloud native accelerated video analytics applications.Our team is building...
-
High Performance Computing Specialist
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeAbout the RoleNVIDIA is seeking a highly skilled and experienced professional to join our team as a GPU Developer Advocate. This is a unique opportunity to work with a leading technology company in the field of High Performance Computing (HPC) and Artificial Intelligence (AI).Key ResponsibilitiesEvent Planning and ExecutionRecruit and manage sites to host...
-
Santa Clara, California, United States Sage Lake Senior Living Full timeAbout the RoleWe are seeking a seasoned Senior SRE Engineer to join our team at Sage Lake Senior Living, where you will play a critical role in monitoring and operating our NVIDIA Inference Microservices (NIMs) factory automation and deployed services.Key ResponsibilitiesOperate a software factory that takes an AI model as input and produces a deployable...
-
Senior Site Reliability Engineer
4 days ago
Santa Clara, California, United States NVIDIA Full timeJob Title: Senior Site Reliability EngineerNVIDIA is a leader in AI, machine learning, and datacenter acceleration. Our company is expanding its leadership into datacenter networking with ethernet switches, NICs, and DPUs. We have continuously reinvented ourselves over two decades.Our invention of the GPU in 1999 sparked the growth of the PC gaming market,...
-
Senior Performance Optimization Engineer
1 week ago
Santa Clara, California, United States NVIDIA Full timeAbout the RoleWe are seeking a highly skilled performance engineer to join our AI Applications organization at NVIDIA. As a performance engineer, you will work closely with our application teams to optimize the performance of our distributed cloud native accelerated video analytics applications.Key ResponsibilitiesPlan, enable, and drive performance...
-
Senior Solutions Architect, NVIDIA
1 week ago
Santa Clara, California, United States NVIDIA Full timeWe are seeking a highly skilled Senior Solutions Architect to join our team at NVIDIA. As a key member of our team, you will be responsible for designing, building, and maintaining large-scale HPC and AI hybrid computing solutions.Key Responsibilities:Guide partners in their adoption of end-to-end Machine Learning and Deep Learning solutions using NVIDIA's...
-
Senior Solutions Architect, NVIDIA
2 weeks ago
Santa Clara, California, United States NVIDIA Full timeWe are seeking a highly skilled Senior Solutions Architect to join our team at NVIDIA. As a key member of our team, you will play a critical role in designing, building, and maintaining large-scale HPC and AI hybrid computing solutions.Key Responsibilities:Guide partners in their adoption of end-to-end Machine Learning and Deep Learning solutions using...
-
High-Performance Database Technology Lead
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob Summary:NVIDIA is seeking a highly skilled Senior Developer Technology Engineer to join our team and contribute to the development of high-performance database systems. As a key member of our team, you will be responsible for researching and developing techniques to GPU-accelerate high-performance database and ETL applications.Key...
-
Senior Cloud Engineer
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob SummaryNVIDIA is seeking a highly skilled Senior SRE Engineer to join its fast-paced Infrastructure, Planning and Processes organization. As a key member of the team, you will be responsible for designing and implementing scalable, resilient cloud infrastructure platforms for NVIDIA's internal cloud provisioning product.Key ResponsibilitiesDesign and...