Current jobs related to Senior GPU Resource Optimization Specialist - Santa Clara, California - NVIDIA


  • Santa Clara, California, United States Advanced Micro Devices , Inc. Full time

    GPU Cluster Performance EngineerWe are seeking a highly motivated and skilled GPU Cluster Performance Engineer to join our dynamic team. In this role, you will be at the forefront of optimizing and achieving peak performance for GPU clusters.Key Responsibilities:Collaborate with hardware and software teams to enhance the overall performance of GPU clusters,...


  • Santa Clara, California, United States Advanced Micro Devices , Inc. Full time

    About the RoleWe are seeking a highly motivated and experienced GPU Performance Optimization Engineer to join our team at Advanced Micro Devices, Inc. (AMD). As a key member of our datacenter GPU platform performance team, you will be responsible for ensuring that our GPU-accelerated systems operate at peak performance, enabling our customers to solve the...


  • Santa Clara, California, United States Advanced Micro Devices , Inc. Full time

    Job Title: GPU Cluster Performance EngineerWe are seeking a highly skilled and motivated GPU Cluster Performance Engineer to join our dynamic team at Advanced Micro Devices, Inc. (AMD). As a key member of our team, you will be responsible for optimizing and achieving peak performance for GPU clusters.Key Responsibilities:Collaborate with hardware and...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Performance Optimization EngineerWe are seeking a highly skilled Senior Performance Optimization Engineer to join our AI Applications organization at NVIDIA. As a key member of our team, you will be responsible for optimizing the performance of our distributed cloud native accelerated video analytics applications.Our team is building...


  • Santa Clara, California, United States Advanced Micro Devices , Inc. Full time

    Job SummaryWe're seeking a highly motivated and skilled GPU Performance Optimization Engineer to join our team at Advanced Micro Devices, Inc. The ideal candidate will have expertise in GPU performance and familiarity with performance monitoring and tuning tools. Key Responsibilities• Define performance suite and best practices for measuring...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Performance Optimization EngineerWe are seeking a highly skilled Senior Performance Optimization Engineer to join our AI Applications organization at NVIDIA. As a key member of our team, you will be responsible for optimizing the performance of our distributed cloud native accelerated video analytics applications.Our team is building...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly skilled Senior Compiler Engineer to join our GPU Software organization. As a key member of our team, you will be responsible for designing and implementing significant parts of our compiler, working on performance analysis and design/implementation of new optimizations, and partnering with global compiler, GPU driver, architecture,...


  • Santa Clara, California, United States Roche Holdings Inc. Full time

    About the Role:We are seeking a highly skilled Senior GPU Software Engineer to join our team at Roche Holdings Inc. in Santa Clara, CA. As a key member of our software development team, you will be responsible for designing, developing, and maintaining high-performance GPU-accelerated software applications.Key Responsibilities:Work closely with research and...


  • Santa Clara, California, United States Advanced Micro Devices , Inc. Full time

    Transforming Lives with AMD TechnologyWe're passionate about using AMD technology to enrich our industry, communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming, and embedded.The Role:We're seeking a highly...


  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in the technology industry, driving innovation in computer graphics, parallel computing, and AI. Our GPU Architecture Group is responsible for designing the world's fastest processors, and we're looking for talented individuals to join our team.Job SummaryWe're seeking a Senior GPU Architect to contribute to the design of our...


  • Santa Clara, California, United States Advanced Micro Devices , Inc. Full time

    Transforming Lives with AMD TechnologyWe are a team of innovators at Advanced Micro Devices, Inc. who are passionate about transforming lives with our technology. Our mission is to build great products that accelerate next-generation computing experiences, driving the evolution of computing experiences for enterprise Data Centers, Artificial Intelligence,...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly skilled Senior GPU Power Architect to join our NVIDIA team. As a key member of our GPU Architecture group, you will be responsible for developing and enhancing various features in the GPU architecture that advance the state of the art in power for graphics and deep learning workloads.You will work closely with other world-class...


  • Santa Clara, California, United States Roche Holdings Inc. Full time

    Unlock the Power of GPU ComputingAt Roche, we're pushing the boundaries of healthcare innovation. As a Principal GPU Software Engineer, you'll play a crucial role in developing cutting-edge GPU-accelerated software that drives breakthroughs in medical research and diagnostics.The Opportunity:Collaborate with research and algorithm experts to accelerate...


  • Santa Clara, California, United States NVIDIA Full time

    This is an exciting opportunity to contribute to the development of cutting-edge GPU technology at NVIDIA.As a GPU Verification Architect, you will play a pivotal role in crafting the future of GPU technology by defining, building, and driving the verification test plan for specific GPU products to ensure outstanding performance.You will collaborate with...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Applied Power Architect - GPUWe are seeking a highly skilled Senior Applied Power Architect - GPU to join our team at NVIDIA. As a world leader in energy-efficient high-performance products, we continue to invest in the research and development of hyper-efficient GPU and SOC architectures.Our team is responsible for developing...


  • Santa Clara, California, United States NVIDIA Full time

    Job DescriptionWe are seeking a highly skilled Senior GPU Performance Architect to join our AI Applications team at NVIDIA. As a key member of our architecture group, you will be responsible for designing and optimizing GPU architectures for AI applications.Key Responsibilities:Competitive analysis and performance studies of new use-cases, such as...


  • Santa Clara, California, United States Roche Holdings Inc. Full time

    About the Role:Roche is seeking a highly skilled Senior GPU Software Engineer to join our team. As a key member of our software development team, you will be responsible for designing, developing, and testing software applications that utilize GPU acceleration.Key Responsibilities:Design and develop software applications that utilize GPU...


  • Santa Clara, California, United States Futran Tech Solutions Pvt. Ltd. Full time

    Job Title: Senior Test Engineer - GPU SoftwareLocation: Futran Tech Solutions Pvt. Ltd.Job Description:This role involves testing GPU software in an Agile environment. The ideal candidate will have experience with board bring-up activities, knowledge of GPU architecture, and proficiency in Python and shell scripting.Key Responsibilities:Design and execute...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is seeking a highly skilled Senior Performance Engineer to join our team of experts in building and optimizing the tools Deep Learning engineers use worldwide to design, develop, and deploy AI applications.We are a diverse and ambitious team that influences all areas of NVIDIA's AI platform and directly contributes to premier Deep Learning frameworks...


  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in the technology world, driving innovation in Artificial Intelligence, High-Performance Computing, and Visualization. Our products are pushing the boundaries of what is possible, and we are seeking talented individuals to join our team.Job SummaryWe are looking for a Senior GPU Performance Architect to join our AI Applications...

Senior GPU Resource Optimization Specialist

2 months ago


Santa Clara, California, United States NVIDIA Full time

NVIDIA has been at the forefront of innovation for over two decades. Our creation of the GPU in 1999 not only propelled the PC gaming industry but also transformed modern graphics and parallel computing.

Recently, the advent of GPU deep learning has ushered in a new era of artificial intelligence — a pivotal moment in computing history.

At NVIDIA, we pride ourselves on being a 'learning machine' that continuously adapts to tackle complex challenges that are unique to us and impactful to the world.

This is our mission: to enhance human creativity and intelligence. We invite you to consider joining our team.


As a key member of the GPU AI/HPC Infrastructure team, you will play a crucial role in the design and execution of cutting-edge GPU compute clusters that support demanding deep learning, high-performance computing, and resource-intensive tasks.

In this position, we are looking for an expert to enhance capacity management and allocation within GPU Compute Clusters.

Your contributions will be vital in addressing strategic challenges related to maximizing and optimizing our utilization of all data center resources, including compute, storage, networking, and power.

You will develop methodologies, tools, and metrics to facilitate effective resource usage in a diverse computing environment and assist in planning for growth across our global computing landscape.


Key Responsibilities:
  • Enhancing our ecosystem surrounding GPU-accelerated computing, including the development of large-scale automation solutions.
  • Assisting researchers in executing their workflows on our clusters, including performance assessments and optimizations of deep learning processes.
  • Identifying customer utilization gaps and job scheduling challenges.
  • Creating automation, tools, and metrics to boost productive resource utilization.
  • Collaborating with the scheduling team to refine scheduling algorithms.
  • Conducting root cause analyses and recommending corrective measures for issues of varying scales.
  • Proactively identifying and resolving potential problems.

Qualifications:
  • Bachelor's degree in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
  • A minimum of 5+ years of experience in designing and managing large-scale compute infrastructure.
  • Proven experience in analyzing and optimizing performance for various AI/HPC workloads.
  • Familiarity with cluster configuration management tools such as Ansible, Puppet, or Salt.
  • Experience with advanced job schedulers in AI/HPC, ideally with knowledge of schedulers like Slurm, K8s, RTDA, or LSF.
  • Understanding of container technologies such as Docker, Singularity, Shifter, or Charliecloud.
  • Proficiency in Python programming and bash scripting.
  • Experience with AI/HPC workflows utilizing MPI.

Preferred Qualifications:
  • Experience with NVIDIA GPUs, CUDA programming, NCCL, and MLPerf benchmarking.
  • Knowledge of Machine Learning and Deep Learning concepts, algorithms, and models.
  • Proficiency in CentOS/RHEL and/or Ubuntu Linux distributions.
  • Familiarity with InfiniBand, IBOP, RDMA, and an understanding of high-speed distributed storage systems like Lustre and GPFS for AI/HPC workloads.
  • Experience with deep learning frameworks such as PyTorch and TensorFlow.

NVIDIA offers competitive salaries and a comprehensive benefits package. Our team consists of some of the brightest minds in the industry, and due to our rapid growth, our world-class engineering teams are expanding quickly.

If you are a creative and independent engineer with a genuine passion for technology, we would love to hear from you.

NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. We value diversity in our current and future employees and do not discriminate based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.