Senior High Performance Computing Cluster Architect

1 week ago


Santa Clara, California, United States NVIDIA Full time
NVIDIA Deep Learning Infrastructure Team

We are seeking a highly skilled HPC cluster administrator to lead our diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.

Key Responsibilities:
  • Administer Linux systems, including powerful DGX servers and embedded systems, and bring up hardware to publicly available systems.
  • Coordinate storage solutions and plan for growth.
  • Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools.
  • Actively collaborate with management regarding any problems with the equipment and propose resolution.
  • Plan, build, and install/upgrade new systems that support NVIDIA DL Software.
Requirements:
  • Bachelor's degree in Computer Science, Electrical Engineering, Computer Engineering, or equivalent experience.
  • 4+ years of previous experience deploying and administering HPC clusters.
  • Familiarity with resource scheduling managers (Slurm, LSF, etc.).
  • Proven track record of scripting in bash, Perl, or Python.
  • Experience with containers (Docker, Singularity, LXC).
  • Deep understanding of operating systems, computer networks, and high-performance applications.
  • Ability to work well with developers and test engineers.
  • Hard-working dedication to provide quality support for users.
Preferred Qualifications:
  • Familiarity with technologies such as Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana, and Docker.
  • Familiarity with GPU usage in Compute Cluster and CUDA.
  • Experience with mobile and embedded systems.
  • Basic knowledge of Deep Learning.
  • Experience coding/scripting in Perl, Python, or bash.

NVIDIA offers a competitive salary range of $148,000 - $230,000 USD, based on location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

We are committed to fostering a diverse work environment and proud to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.



  • Santa Clara, California, United States NVIDIA Full time

    Job DescriptionNVIDIA is seeking a highly skilled Senior High Performance Computing Cluster Administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.Key ResponsibilitiesAdminister Linux systems, ranging from powerful DGX servers to embedded...


  • Santa Clara, California, United States Tenstorrent Inc Full time

    High-Performance Computing ArchitectTenstorrent Inc is seeking a skilled High-Performance Computing (HPC) Architect to design and implement cutting-edge architectures for high-performance computing systems. As an HPC Architect, you will play a crucial role in enabling efficient and scalable computation for scientific, research, and data-intensive...


  • Santa Clara, California, United States Tenstorrent Inc Full time

    About the RoleTenstorrent Inc is seeking a highly skilled and experienced High-Performance Computing (HPC) Architect to lead the design and implementation of cutting-edge HPC systems. As an HPC Architect, you will play a crucial role in delivering optimized solutions that meet the demanding requirements of HPC workloads.Key ResponsibilitiesDesign and Develop...


  • Santa Clara, California, United States Advanced Micro Devices , Inc. Full time

    Job Title: GPU Cluster Performance EngineerWe are seeking a highly skilled and motivated GPU Cluster Performance Engineer to join our dynamic team at Advanced Micro Devices, Inc. (AMD). As a key member of our team, you will be responsible for optimizing and achieving peak performance for GPU clusters.Key Responsibilities:Collaborate with hardware and...


  • Santa Clara, California, United States Advanced Micro Devices , Inc. Full time

    GPU Cluster Performance EngineerWe are seeking a highly motivated and skilled GPU Cluster Performance Engineer to join our dynamic team at Advanced Micro Devices, Inc.In this role, you will be at the forefront of optimizing and achieving peak performance for GPU clusters. The ideal candidate will have a strong background in GPU architectures, parallel...


  • Santa Clara, California, United States Advanced Micro Devices , Inc. Full time

    GPU Cluster Performance EngineerWe are seeking a highly motivated and skilled GPU Cluster Performance Engineer to join our dynamic team. In this role, you will be at the forefront of optimizing and achieving peak performance for GPU clusters.Key Responsibilities:Collaborate with hardware and software teams to enhance the overall performance of GPU clusters,...


  • Santa Clara, California, United States Ampere Computing Full time

    About the RoleWe are seeking an experienced Senior Principal Architect to join our team at Ampere Computing. As a key member of our architecture team, you will be responsible for defining enhancements and new sub-components for our custom interconnect IP, to meet the requirements of our future AI accelerator product roadmap.Key ResponsibilitiesOwn the...


  • Santa Clara, California, United States Advanced Micro Devices , Inc. Full time

    GPU Cluster Performance EngineerAt Advanced Micro Devices, Inc., we're pushing the boundaries of innovation to solve the world's most complex challenges. We're seeking a highly skilled GPU Cluster Performance Engineer to join our dynamic team.Key Responsibilities:Performance Optimization: Collaborate with hardware and software teams to enhance the overall...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly skilled Senior Solutions Architect to join our team at NVIDIA. As a key member of our team, you will be responsible for designing, building, and maintaining large-scale HPC and AI hybrid computing solutions.Key Responsibilities:Guide partners in their adoption of end-to-end Machine Learning and Deep Learning solutions using NVIDIA's...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly skilled Senior Solutions Architect to join our team at NVIDIA. As a key member of our team, you will play a critical role in designing, building, and maintaining large-scale HPC and AI hybrid computing solutions.Key Responsibilities:Guide partners in their adoption of end-to-end Machine Learning and Deep Learning solutions using...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly skilled Senior Solutions Architect to join our team at NVIDIA. As a key member of our team, you will be responsible for designing, building, and maintaining large-scale HPC and AI hybrid computing solutions.Key Responsibilities:Guide partners in their adoption of end-to-end Machine Learning and Deep Learning solutions using NVIDIA's...


  • Santa Clara, California, United States Nvidia Full time

    NVIDIA Job DescriptionWe are seeking a highly skilled Senior HPC Cluster Administrator to lead our GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.Key Responsibilities:Administer Linux systems, including powerful DGX servers and embedded systems, and bring up hardware to...


  • Santa Clara, California, United States Ampere Computing Full time

    About the RoleWe are seeking an experienced PCIe endpoint architect to join our team at Ampere Computing. As a key member of our architecture team, you will be responsible for defining the architecture of our future AI accelerator product roadmap, including command processing and GDMA requirements.Key ResponsibilitiesOwn the architecture of the product from...

  • Senior CPU Architect

    1 month ago


    Santa Clara, California, United States Sunlune Full time

    Job Description**Role:** CPU Architecture Engineer, Full-time**About the Role:** We are seeking a highly skilled CPU Architecture Engineer to join our team at Sunlune. As a key member of our engineering team, you will be responsible for designing and optimizing high-performance CPU architectures for AI applications.**Responsibilities:**Design and optimize...


  • Santa Clara, California, United States NVIDIA Full time

    About the RoleWe are seeking a highly skilled Senior GPU Performance Architect to join our AI Applications team at NVIDIA. As a key member of our architecture group, you will be responsible for driving innovation in the graphics and parallel computing fields, delivering the highest performance in the world for graphics processing.The successful candidate...


  • Santa Clara, California, United States Ampere Computing Full time

    About the RoleWe are seeking an experienced Chiplet Connections Architect to join our team at Ampere Computing. As a key member of our architecture team, you will be responsible for defining enhancements and new sub-components for our custom interconnect IP, meeting the requirements of our future AI accelerator product roadmap.Key ResponsibilitiesOwn the...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly skilled Senior GPU Performance Architect to join our AI Applications team at NVIDIA. As a key member of our architecture group, you will play a critical role in driving innovation and delivering cutting-edge performance in the field of AI.The ideal candidate will have a strong background in computer science, electrical engineering, or...


  • Santa Clara, California, United States NVIDIA Full time

    About the RoleWe are seeking a highly skilled Senior GPU Performance Architect to join our AI Applications team at NVIDIA. As a key member of our architecture group, you will play a critical role in driving innovation and delivering cutting-edge performance in the field of Artificial Intelligence.Your primary focus will be on competitive analysis and...


  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in the technology world, driving innovation in Artificial Intelligence, High-Performance Computing, and Visualization. Our products are pushing the boundaries of what is possible, and we are seeking talented individuals to join our team.Job SummaryWe are looking for a Senior GPU Performance Architect to join our AI Applications...


  • Santa Clara, California, United States NVIDIA Corporation Full time

    Job DescriptionAbout NVIDIA CorporationNVIDIA Corporation is a leader in the field of artificial intelligence (AI) and emerging technologies. We are seeking a highly skilled Solutions Architect to join our team and contribute to the success of our AI Enterprise (NVAIE) Segment Team.Job SummaryWe are looking for a Solutions Architect with expertise in cloud...