High Performance Computing Cluster Architect

4 weeks ago


Santa Clara, California, United States NVIDIA Full time

NVIDIA is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads.

Key Responsibilities:

  • Administer Linux systems, ranging from powerful DGX servers to embedded systems, and bring up hardware to publicly available systems.
  • Coordinate storage solutions and plan for growth.
  • Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.).
  • Actively connect with management regarding any problems with the equipment and propose resolution.
  • Plan, build, and install/upgrade new systems that support NVIDIA DL Software.

Requirements:

  • You have a BA, BS, or MS in CS, EE, CE, or equivalent experience.
  • 4+ years of previous experience deploying and administering HPC clusters.
  • Familiar with resource scheduling managers (Slurm (preferred), LSF, etc.).
  • Proven track record to script in bash, Perl, or Python.
  • Experience with containers (Docker, Singularity, LXC).
  • Deep understanding of operating systems, computer networks, and high-performance applications.
  • Ability to work well with developers & test engineers.
  • Hard-working dedication to provide quality in support for your users.

Preferred Qualifications:

  • Familiarity and prior work experience with technologies such as: Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana, and Docker.
  • Familiarity with GPU usage in Compute Cluster and Cuda.
  • Experience with mobile and embedded systems.
  • Basic knowledge of Deep Learning.
  • Experience coding/scripting in Perl/Python/bash.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.

The base salary range is $148,000 - $230,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits.



  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA's Deep Learning Optimized Frameworks Group is seeking a highly skilled HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural guidance to product teams in the deep learning and scientific computing domains.As a member of the DLFW Infrastructure team, you will provide leadership in the design and...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.We are the GPU Communications Libraries and...


  • Santa Clara, California, United States Santa Clara University Full time

    Job Title: High Performance Computing Systems AdministratorJob Summary:We are seeking a highly skilled High Performance Computing (HPC) Systems Administrator to join our team at Santa Clara University. The successful candidate will be responsible for the administration, maintenance, and optimization of our HPC systems, ensuring the smooth operation of our...


  • Santa Clara, California, United States Santa Clara University Full time

    Job Title: High Performance Computing Systems AdministratorJob Summary: We are seeking a highly skilled High Performance Computing Systems Administrator to join our team at Santa Clara University. The successful candidate will be responsible for the administration, maintenance, and optimization of our HPC systems, ensuring the smooth operation of our...


  • Santa Clara, California, United States NVIDIA Full time

    A key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join our multifaceted software team with high standards.This role involves...


  • Santa Clara, California, United States NVIDIA Full time

    A key part of NVIDIA's strength is our sophisticated analysis and debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and the running applications.We are seeking a forward-thinking, hard-working, and creative software engineer to join a multifaceted software team with high standards.This role involves...


  • Santa Clara, California, United States Nvidia Full time

    Job SummaryNVIDIA is seeking a highly skilled Senior HPC Cluster Administrator to lead our GPU Compute Cluster team. As a key member of our Deep Learning Frameworks Group, you will be responsible for designing and implementing cutting-edge GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive...


  • Santa Clara, California, United States Ampere Full time

    About the Role:We are seeking an experienced coherent interconnect architect to join our team at Ampere, a leading semiconductor design company. As a key member of our mesh architecture team, you will be responsible for defining enhancements and new sub-components for our custom interconnect IP, meeting the requirements of our future AI accelerator product...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is seeking a highly skilled Cloud AI Infrastructure Engineer to drive the performance analysis, optimization, and modeling of NVIDIA DGXTM Cloud clusters.The ideal candidate will have a deep understanding of the methodology to conduct end-to-end performance analysis of critical AI applications running on large-scale parallel and distributed...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a leader in the field of high-performance computing, and we are currently seeking a skilled Database Performance Architect to join our team.The successful candidate will have a deep understanding of database performance optimization and the ability to design and implement high-performance database solutions using GPU acceleration.This is a unique...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly skilled Senior Systems Architect to join our team and contribute to the development of our cloud compute platform for Autonomous Vehicles (AV).The platform provides access to 100s of PBs of data and exa-scale GPU+CPU compute for various AV workloads including data ingestion, processing and model training.As a Senior Systems Architect,...


  • Santa Clara, California, United States Apple Full time

    Job SummaryWe are seeking a highly motivated and innovative CPU Performance Architect to join our team at Apple. As a key member of our CPU Architecture and Performance Team, you will be responsible for driving advanced exploration of next-generation iPhone, iPad, and Mac CPU architectures.As a CPU Performance Architect, you will work closely with...


  • Santa Clara, California, United States NVIDIA Full time

    Job DescriptionNVIDIA is seeking an experienced Solutions Architect to join our AI Infrastructure team. As a key member of our team, you will be responsible for driving our end-to-end technology solutions integration with strategic technology customers.Key Responsibilities:Work with NVIDIA Consumer Internet and IT Services customers on data center GPU server...


  • Santa Clara, California, United States Apple Full time

    CPU Performance Architect RoleWe are seeking a highly motivated and innovative individual to join our CPU Architecture and Performance Team. As a CPU Performance Architect, you will be part of a team that is driving advanced exploration for next generation iPhone, iPad, and Mac CPU architectures.Key ResponsibilitiesCollaborate with experienced CPU designers...


  • Santa Clara, California, United States Qualcomm Full time

    Job SummaryAs a CPU Performance Modeling Architect at Qualcomm, you will be responsible for designing and developing high-performance CPUs that push the boundaries of what's possible. You will work closely with cross-functional teams to enhance the world of compute products from conceptualization through post-silicon verification.Key...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a talented Deep Learning Computer Architect to join our team at NVIDIA. As a member of our deep learning architecture team, you will contribute to features that help next-generation GPUs advance the state of AI.This position requires you to keep up with the latest DL research and collaborate with diverse teams, including DL researchers,...


  • Santa Clara, California, United States Apple Full time

    Job Title: CPU Performance ArchitectAbout the Role:We are seeking a highly motivated and innovative individual to join our CPU Architecture and Performance Team. As a CPU Performance Architect, you will be part of a team that is driving advanced exploration for next generation iPhone, iPad, and Mac CPU architectures.Key Responsibilities:Collaborate with...


  • Santa Clara, California, United States NVIDIA Full time

    Unlock the Power of HPC Cluster ManagementNVIDIA is at the forefront of transforming computer graphics, PC gaming, and accelerated computing. We're now pushing the boundaries of AI to define the next era of computing.As a Senior Software Developer, you'll be part of a diverse and supportive environment where everyone is inspired to do their best work. You'll...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Product Architect, HPC and AIJob Summary: We are seeking a visionary Product Architect to join our team at NVIDIA. As a key member of our team, you will harness your infrastructure expertise to create reference designs for the world's most powerful AI clusters.Responsibilities:* Design the next-gen datacenter-scale AI infrastructure,...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly skilled Senior GPU Performance Architect to join our AI Applications team at NVIDIA. As a key member of our team, you will be responsible for designing and developing high-performance GPU architectures for AI applications.Key responsibilities include:Competitive analysis and performance studies of new use-cases, such as large-scale...