Current jobs related to Senior High Performance Computing Cluster Administrator - Santa Clara, California - NVIDIA


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA Deep Learning Infrastructure TeamWe are seeking a highly skilled and experienced HPC cluster administrator to lead our diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.Key Responsibilities:Design and implement groundbreaking GPU compute clusters that...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior High Performance Computing Cluster AdministratorNVIDIA's Deep Learning Optimized Frameworks Group is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.Key...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA's Deep Learning Optimized Frameworks Group is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural guidance to product teams in the deep learning and scientific computing domains.As a member of the DLFW Infrastructure team, you will provide leadership in the design and...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute...


  • Santa Clara, California, United States Santa Clara University Full time

    Job Title: High Performance Computing Systems AdministratorJob Summary:We are seeking a highly skilled High Performance Computing (HPC) Systems Administrator to join our team at Santa Clara University. The successful candidate will be responsible for the administration, maintenance, and optimization of our HPC systems, ensuring the smooth operation of our...


  • Santa Clara, California, United States Nvidia Full time

    Job SummaryNVIDIA is seeking a highly skilled Senior HPC Cluster Administrator to lead our GPU Compute Cluster team. As a key member of our Deep Learning Frameworks Group, you will be responsible for designing and implementing cutting-edge GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive...


  • Santa Clara, California, United States Santa Clara University Full time

    Job Title: High Performance Computing Systems AdministratorJob Summary:Santa Clara University is seeking a highly skilled High Performance Computing Systems Administrator to join our dynamic team. As a key member of our IT department, you will be responsible for the administration, maintenance, and optimization of our HPC systems, ensuring the smooth...


  • Santa Clara, California, United States Santa Clara University Full time

    Job Title: High Performance Computing Systems AdministratorJob Summary: We are seeking a highly skilled High Performance Computing Systems Administrator to join our team at Santa Clara University. The successful candidate will be responsible for the administration, maintenance, and optimization of our HPC systems, ensuring the smooth operation of our...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.We are the GPU Communications Libraries and...


  • Santa Clara, California, United States NVIDIA Full time

    Unlock the Power of HPC Cluster ManagementNVIDIA is at the forefront of transforming computer graphics, PC gaming, and accelerated computing. We're now pushing the boundaries of AI to define the next era of computing.As a Senior Software Developer, you'll be part of a diverse and supportive environment where everyone is inspired to do their best work. You'll...


  • Santa Clara, California, United States NVIDIA Full time

    A key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join a multifaceted software team with high standards.This role involves...


  • Santa Clara, California, United States NVIDIA Full time

    A key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join a multifaceted software team with high standards.This role involves...


  • Santa Clara, California, United States NVIDIA Full time

    A key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join our multifaceted software team with high standards.This role involves...


  • Santa Clara, California, United States Nvidia Full time

    Unlock the Power of High-Performance ComputingNVIDIA is revolutionizing the field of Artificial Intelligence, High Performance Computing, and Visualization. As a key player in this space, we're seeking a motivated Performance Engineer to join our GPU Communications Libraries and Networking team.As a Performance Engineer, you'll play a crucial role in shaping...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is seeking a highly skilled Cloud AI Infrastructure Engineer to drive the performance analysis, optimization, and modeling of NVIDIA DGXTM Cloud clusters.The ideal candidate will have a deep understanding of the methodology to conduct end-to-end performance analysis of critical AI applications running on large-scale parallel and distributed...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Performance Optimization EngineerWe are seeking a highly skilled Senior Performance Optimization Engineer to join our AI Applications organization at NVIDIA. As a key member of our team, you will be responsible for optimizing the performance of our distributed cloud native accelerated video analytics applications.Our team is building...


  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in the field of artificial intelligence, machine learning, and datacenter acceleration. Our company has a rich history of innovation, with a legacy that dates back to the invention of the GPU in 1999. This groundbreaking technology sparked the growth of the PC gaming market, redefined modern computer graphics, and...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is seeking a senior build and continuous integration (CI/CD) engineer for its GenAI Frameworks (NeMo, Megatron Core) team.NVIDIA NeMo is an open-source, scalable, and cloud-native framework built for researchers and developers working on Large Language Models (LLM), Multimodal (MM), and Speech AI.NeMo provides end-to-end model training, including data...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior High-Performance AI Training EngineerWe are seeking a highly skilled Senior High-Performance AI Training Engineer to join our team at NVIDIA. As a key member of our engineering team, you will be responsible for optimizing AI training workloads on innovative hardware and software platforms.Key Responsibilities:Understand and analyze AI...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior High-Performance AI Training EngineerNVIDIA is seeking a highly skilled Senior High-Performance AI Training Engineer to join our team. As a key member of our engineering team, you will be responsible for optimizing AI training workloads on innovative hardware and software platforms.Key Responsibilities:Understand, analyze, profile, and...

Senior High Performance Computing Cluster Administrator

2 months ago


Santa Clara, California, United States NVIDIA Full time
Job Description

NVIDIA is seeking a highly skilled Senior High Performance Computing Cluster Administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.

Key Responsibilities
  • Administer Linux systems, ranging from powerful DGX servers to embedded systems, and bring up hardware to publicly available systems.
  • Coordinate storage solutions and plan for growth.
  • Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.).
  • Actively connect with management regarding any problems with the equipment and propose resolution.
  • Plan, build, and install/upgrade new systems that support NVIDIA DL Software.
Requirements
  • BA, BS, or MS in CS, EE, CE, or equivalent experience.
  • 4+ years of previous experience deploying and administering HPC clusters.
  • Familiar with resource scheduling managers (Slurm (preferred), LSF, etc).
  • Proven track record to script in bash, Perl, or Python.
  • Experience with containers (Docker, Singularity, LXC).
  • Deep understanding of operating systems, computer networks, and high-performance applications.
  • Ability to work well with developers & test engineers.
  • Hard-working dedication to provide quality in support for your users.
Preferred Qualifications
  • Familiarity and prior work experience with technologies such as: Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana, and Docker.
  • Familiarity with GPU usage in Compute Cluster and Cuda.
  • Experience with mobile and embedded systems.
  • Basic knowledge of Deep Learning.
  • Experience coding/scripting in Perl/Python/bash.
What We Offer

NVIDIA offers a competitive salary range of $148,000 - $230,000 USD, based on location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.