GPU Cluster Performance Optimization Specialist
5 days ago
We are seeking a highly motivated and skilled GPU Cluster Performance Engineer to join our dynamic team. In this role, you will be at the forefront of optimizing and achieving peak performance for GPU clusters.
Key Responsibilities:- Collaborate with hardware and software teams to enhance the overall performance of GPU clusters, focusing on aspects such as RDMA throughput, latency, and collective communications.
- Develop and execute comprehensive benchmarking strategies to assess baseline performance, analyze bottlenecks, and identify areas for improvement within GPU cluster environments.
- Evaluate the scalability of GPU clusters by conducting thorough testing under various workloads, ensuring optimal performance across different cluster sizes, configurations, and networking technologies (IB & RoCE).
- Utilize profiling tools and methodologies to analyze and identify performance bottlenecks, providing actionable insights for improvement.
- Implement optimization strategies, including but not limited to protocol enhancements, load balancing techniques, and parallel processing optimizations.
- Create detailed documentation of performance analysis, tuning efforts, and outcomes, providing clear and concise reports for internal teams and stakeholders.
- Work closely with cross-functional teams, including hardware engineers, software developers, and system architects, to integrate performance improvements into the GPU cluster architecture.
- Stay current with the latest developments in GPU architectures, parallel processing, and emerging technologies to drive continuous improvement in GPU cluster performance.
- Proven experience in optimizing the performance of GPU clusters.
- Strong understanding of GPU architectures, parallel computing concepts, and network protocols.
- Proficiency in scripting languages (e.g., Python, Bash) for automation and performance analysis.
- Experience with system level performance analysis tools and methodologies for GPU clusters.
- Analytical mindset with excellent problem-solving and debug skills.
- Familiarity with cluster management tools and systems.
- Excellent communication and collaboration skills for effective teamwork.
- RDMA network configuration, troubleshooting and performance tuning.
- Linux kernel networking expertise
- Machine learning and/or HPC system design
- Bachelors or Master's degree in computer science or equivalent experience
At AMD, your base pay is one part of your total rewards package. Your base pay will depend on where your skills, qualifications, experience, and location fit into the hiring range for the position. You may be eligible for incentives based upon your role such as either an annual bonus or sales incentive. Many AMD employees have the opportunity to own shares of AMD stock, as well as a discount when purchasing AMD stock if voluntarily participating in AMD's Employee Stock Purchase Plan. You'll also be eligible for competitive benefits described in more detail here.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.
We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
-
GPU Cluster Performance Optimization Specialist
2 weeks ago
Santa Clara, California, United States Advanced Micro Devices , Inc. Full timeGPU Cluster Performance EngineerWe are seeking a highly motivated and skilled GPU Cluster Performance Engineer to join our dynamic team at Advanced Micro Devices, Inc.In this role, you will be at the forefront of optimizing and achieving peak performance for GPU clusters. The ideal candidate will have a strong background in GPU architectures, parallel...
-
GPU Cluster Performance Optimization Specialist
3 weeks ago
Santa Clara, California, United States Advanced Micro Devices , Inc. Full timeGPU Cluster Performance EngineerAt Advanced Micro Devices, Inc., we're pushing the boundaries of innovation to solve the world's most complex challenges. We're seeking a highly skilled GPU Cluster Performance Engineer to join our dynamic team.Key Responsibilities:Performance Optimization: Collaborate with hardware and software teams to enhance the overall...
-
GPU Cluster Performance Optimization Engineer
2 weeks ago
Santa Clara, California, United States Advanced Micro Devices , Inc. Full timeJob Title: GPU Cluster Performance EngineerWe are seeking a highly skilled and motivated GPU Cluster Performance Engineer to join our dynamic team at Advanced Micro Devices, Inc. (AMD). As a key member of our team, you will be responsible for optimizing and achieving peak performance for GPU clusters.Key Responsibilities:Collaborate with hardware and...
-
GPU Performance Optimization Specialist
4 weeks ago
Santa Clara, California, United States Apple Full timeAbout the RoleWe are seeking a highly skilled Performance Engineer to join our team at Apple. As a key member of our GPU, Graphics, and Display Software team, you will play a critical role in optimizing the performance of our latest Apple Silicon GPUs.Key ResponsibilitiesAnalyze workloads to identify hardware issues and software bottlenecksCollaborate with...
-
GPU Performance Optimization Engineer
1 month ago
Santa Clara, California, United States Advanced Micro Devices , Inc. Full timeAbout the RoleWe are seeking a highly motivated and experienced GPU Performance Optimization Engineer to join our team at Advanced Micro Devices, Inc. (AMD). As a key member of our datacenter GPU platform performance team, you will be responsible for ensuring that our GPU-accelerated systems operate at peak performance, enabling our customers to solve the...
-
GPU Performance Optimization Specialist
3 weeks ago
Santa Clara, California, United States Apple Full timeAbout the RoleWe are seeking a highly skilled Performance Engineer to join our GPU, Graphics, and Display Software team at Apple. As a key member of our team, you will play a critical role in achieving excellent performance of our latest Apple Silicon GPUs.Key ResponsibilitiesAnalyze workloads to identify hardware issues and software bottlenecksCollaborate...
-
GPU Performance Optimization Engineer
3 weeks ago
Santa Clara, California, United States Apple Full timeAbout the RoleWe are seeking a highly skilled Performance Engineer to join our team at Apple, where you will play a critical role in optimizing the performance of our latest Apple Silicon GPUs. As a key member of our team, you will work closely with engineers from driver, framework, hardware, and architecture teams to identify and resolve performance...
-
Performance Optimization Specialist
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeAbout NVIDIANVIDIA is a leader in the field of artificial intelligence, deep learning, and autonomous vehicles. Our team is passionate about building innovative software solutions that impact the world.Job SummaryWe are seeking a highly skilled software engineer to join our team as a Performance Optimization Specialist. In this role, you will work with our...
-
Senior Performance Optimization Engineer
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeAbout the RoleWe are seeking a highly skilled performance engineer to join our AI Applications organization at NVIDIA. As a performance engineer, you will work closely with our application teams to optimize the performance of our distributed cloud native accelerated video analytics applications.Key ResponsibilitiesPlan, enable, and drive performance...
-
Senior Performance Optimization Engineer
2 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob Title: Senior Performance Optimization EngineerWe are seeking a highly skilled Senior Performance Optimization Engineer to join our AI Applications organization at NVIDIA. As a key member of our team, you will be responsible for optimizing the performance of our distributed cloud native accelerated video analytics applications.Our team is building...
-
Senior Performance Optimization Engineer
1 week ago
Santa Clara, California, United States NVIDIA Full timeJob Title: Senior Performance Optimization EngineerWe are seeking a highly skilled Senior Performance Optimization Engineer to join our AI Applications organization at NVIDIA. As a key member of our team, you will be responsible for optimizing the performance of our distributed cloud native accelerated video analytics applications.Our team is building...
-
Principal Engineer
2 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob Title: Principal Engineer - Performance OptimizationWe are seeking a highly skilled Principal Engineer to join our AI Applications organization at NVIDIA. As a key member of our team, you will be responsible for optimizing the performance of our distributed cloud native accelerated video analytics applications.Our team is building cutting-edge...
-
Datacenter GPU Performance Specialist
2 weeks ago
Santa Clara, California, United States Advanced Micro Devices , Inc. Full timeUnlock the Power of Datacenter GPU PerformanceAt Advanced Micro Devices, Inc., we're pushing the boundaries of innovation to solve the world's most complex challenges. As a Datacenter GPU Platform Performance Engineer, you'll play a critical role in ensuring our Instinct GPU-accelerated systems operate at peak performance, empowering our customers to tackle...
-
Santa Clara, California, United States NVIDIA Full timeJob DescriptionNVIDIA is seeking a highly skilled Senior High Performance Computing Cluster Administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.Key ResponsibilitiesAdminister Linux systems, ranging from powerful DGX servers to embedded...
-
Santa Clara, California, United States NVIDIA Full timeNVIDIA Deep Learning Infrastructure TeamWe are seeking a highly skilled HPC cluster administrator to lead our diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.Key Responsibilities:Administer Linux systems, including powerful DGX servers and embedded systems,...
-
Performance Optimization Engineer
2 weeks ago
Santa Clara, California, United States NVIDIA Full timeAbout NVIDIANVIDIA is a leader in the field of artificial intelligence, deep learning, and autonomous vehicles. Our engineering teams are working on cutting-edge technologies that are transforming the world.Job SummaryWe are seeking a highly skilled software engineer to join our team as a Performance Engineer. In this role, you will be responsible for...
-
Performance Optimization Engineer
3 weeks ago
Santa Clara, California, United States NVIDIA Full timeUnlock the Power of AI with NVIDIANVIDIA is seeking a talented software engineer to join our team and contribute to the development of cutting-edge AI technologies. As a Performance Optimization Engineer, you will play a critical role in optimizing the performance of Deep Learning models for NVIDIA GPUs and systems.Key Responsibilities:Optimize the...
-
HPC Cluster Engineer
2 weeks ago
Santa Clara, California, United States Sustainable Talent Full timeUnlock the Power of HPCSustainable Talent is seeking a seasoned HPC Cluster Engineer to join our team in shaping the future of AI, deep learning, and machine learning initiatives. As a key player in our Nvidia-powered HPC environment, you'll leverage cutting-edge GPU technology to drive groundbreaking discoveries and revolutionize industries.With over 25...
-
Performance Optimization Engineer
2 weeks ago
Santa Clara, California, United States NVIDIA Full timeJoin NVIDIA's Ambitious Team as a Performance EngineerNVIDIA is seeking a talented software engineer to join our team and contribute to the development of cutting-edge AI and Deep Learning technologies. As a Performance Engineer, you will play a crucial role in optimizing the performance of Deep Learning models for NVIDIA GPUs and systems.Key...
-
Datacenter GPU Performance Specialist
4 weeks ago
Santa Clara, California, United States Advanced Micro Devices , Inc. Full timeAbout the RoleWe are seeking a highly motivated and experienced Datacenter GPU Platform Performance Engineer to join our team at Advanced Micro Devices, Inc. This is an exciting opportunity to work on cutting-edge technology and contribute to the development of innovative solutions for the data center, artificial intelligence, and other emerging markets.Key...