Senior AI-HPC Cluster Engineer
2 weeks ago
Senior HPC Cluster Engineer page is loaded
Senior HPC Cluster Engineer
Apply
locations
US, CA, Santa Clara
US, MA, Westford
US, TX, Austin
US, NC, Durham
time type
Full time
posted on
Posted 7 Days Ago
job requisition id
JR1965956
NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to solve, that only we can tackle, and that matter to the world. This is our life’s work, to amplify human imagination and intelligence. Make the choice to join us today
As a member of the GPU/HPC Infrastructure team, you will provide leadership in the design and implementation of ground breaking GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek an expert to identify architectural changes and/or completely new approaches for our GPU Compute Clusters. As an expert, you will help us with the strategic challenges we encounter including: compute, networking, and storage design for large scale, high performance workloads, effective resource utilization in a heterogeneous compute environment, evolving our private/public cloud strategy, capacity modeling, and growth planning across our global computing environment.
What you'll be doing:
Building and improving our ecosystem around GPU-accelerated computing including developing large scale automation solutions
Maintaining and building deep learning clusters at scale
Supporting our researchers to run their flows on our clusters including performance analysis and optimizations of deep learning workflows
Root cause analysis and suggest corrective action for problems large and small scales
Finding and fixing problems before they occur
What we need to see:
Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience.
Minimum 5 years of experience designing and operating large scale compute infrastructure.
Experience analyzing and tuning performance for a variety of HPC workloads.
Working knowledge of cluster configuration managements tools such as Ansible, Puppet, Salt.
Experience with HPC cluster job schedulers such as SLURM, LSF
In depth understating of container technologies like Docker, Singularity, Shifter, Charliecloud
Proficient in Centos/RHEL and/or Ubuntu Linux distros including Python programming and bash scripting
Experience with HPC workflows that use MPI
Ways to stand out from the crowd:
Understanding of MLPerf benchmarking
Familiarity with InfiniBand with IBOP and RDMA
Understanding of fast, distributed storage systems like Lustre and GPFS for HPC workloads.
Background with Software Defined Networking and HPC cluster networking
Familuarity with deep learning frameworks like PyTorch and TensorFlow
NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most brilliant and talented people in the world working for us and, due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.
The base salary range is 148,000 USD - 339,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
You will also be eligible for equity and benefits .
NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
Similar Jobs (5)
HPC Cluster Administrator
locations
US, CA, Santa Clara
time type
Full time
posted on
Posted 7 Days Ago
Senior HPC Performance Engineer
locations
US, CA, Santa Clara
time type
Full time
posted on
Posted 7 Days Ago
Senior HPC Programming Model Architect - C++
locations
4 Locations
time type
Full time
posted on
Posted 7 Days Ago
NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and the metaverse is transforming the world's largest industries and profoundly impacting society.
#J-18808-Ljbffr
-
GPU Computing Capacity Optimization Engineer
2 weeks ago
Durham, United States NVIDIA Full timeNVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is a “learning machine” that constantly evolves by...
-
Senior HPC Technical Account Manager
2 weeks ago
Durham, United States NVIDIA Full timeWe are seeking a motivated Senior HPC Technical Account Manager, passionate about data center and networking technologies, to provide comprehensive solutions for sophisticated installations, maintenance, or operations for a broad scope of groundbreaking networking products and will provide a premium customer experience to some of our largest customers by...
-
Senior Developer Technology Engineer
1 month ago
Durham, United States NVIDIA Full timeNVIDIA is looking for a passionate, world-class computer scientist to work in its Compute Developer Technology (Devtech) team as an AI Developer Technology Engineer. Artificial intelligence, the dream of computer scientists for over half a century, is no longer science fiction. And in the next few years, it will transform every industry. Soon, self-driving...
-
Senior Infrastructure Software Engineer
2 weeks ago
Durham, United States NVIDIA Full timeNVIDIA’s Deep Learning Architecture and Libraries group is seeking excellent Software Engineers to design and develop the software stack for our next generation test and development cluster, the core infrastructure that provides a foundation for every stage of our product development. Our mission, which spans both hardware and software, is to consistently...
-
Data Scientist
7 days ago
Durham, NC, United States Ascendion Inc. Full timeAscendion is a full-service digital engineering solutions company. We make and manage software platforms and products that power growth and deliver captivating experiences to consumers and employees. Our engineering, cloud, data, experience design, and talent solution capabilities accelerate transformation and impact for enterprise clients. We have a culture...
-
Data Scientist
7 days ago
Durham, United States Ascendion Inc. Full timeAbout Ascendion Ascendion is a full-service digital engineering solutions company. We make and manage software platforms and products that power growth and deliver captivating experiences to consumers and employees. Our engineering, cloud, data, experience design, and talent solution capabilities accelerate transformation and impact for enterprise clients....
-
Senior ASIC Verification Engineer
2 weeks ago
Durham, United States NVIDIA Full timeSenior ASIC Verification Engineer - GPU page is loaded Senior ASIC Verification Engineer - GPU Apply locations US, CA, Santa Clara US, TX, Austin US, NC, Durham time type Full time posted on Posted 30+ Days Ago job requisition id JR1960956 NVIDIA is seeking elite ASIC Verification Engineers to verify the design and implementation of the world’s leading...
-
Senior Software Engineer
5 days ago
Durham, United States NVIDIA Full timeSenior Software Engineer - Chip Design Tools page is loaded Senior Software Engineer - Chip Design Tools Apply locations US, CA, Santa Clara US, MA, Westford US, TX, Austin US, NC, Durham time type Full time posted on Posted 4 Days Ago job requisition id JR1977911 NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999...
-
Senior CPU Design Engineer
2 weeks ago
Durham, United States NVIDIA Full timeSenior CPU Design Engineer page is loaded Senior CPU Design Engineer Apply locations US, OR, Hillsboro US, MA, Westford US, TX, Austin US, NC, Durham US, CA, Remote time type Full time posted on Posted 30+ Days Ago job requisition id JR1978695 We are looking for a Senior CPU Design Engineer!NVIDIA is seeking best-in-class CPU Design Engineers to design and...
-
Data Scientist
7 days ago
Durham, NC, United States Ascendion Inc. Full timeAbout Ascendion Ascendion is a full-service digital engineering solutions company. We make and manage software platforms and products that power growth and deliver captivating experiences to consumers and employees. Our engineering, cloud, data, experience design, and talent solution capabilities accelerate transformation and impact for enterprise clients....
-
Analytics / Reporting Engineer
2 weeks ago
Durham, United States JobRialto Full timeJob Description: The Workplace Investing Analytics and Reporting Chapter team is looking for an Engineer to join our team to help deliver reporting application and AI (Artificial Intelligence) models into production. This role is a dynamic Agile engineering position where you will partner with your teammates on our development team and peer data scientists...
-
Sr. Director, Program Manager, Gen AI
4 days ago
Durham, NC, United States IQVIA Full timeOverview:IQVIA is a global leader in healthcare intelligence and innovation, leveraging the power of data, analytics, and artificial intelligence to transform the industry. We are looking for a RDS Generative AI Program Manager to join our RDS Gen AI Program team and help us deliver cutting-edge solutions that leverage generative AI to enhance speed,...
-
Reporting and Analytics Engineer
2 weeks ago
Durham, United States RIT Solutions, Inc. Full timeReporting and Analytics Engineer Durham, NC Advertising, Marketing & Communications SUMMARY: The client, is initiating a migration project to enhance and update their current reporting infrastructure, aiming to move away from the existing ecosystem of Oracle Business Intelligence Enterprise Edition (OBIEE) and Oracle Exadata towards a new setup involving...
-
Reporting / Analytics Engineer
1 week ago
Durham, United States Skilzmatrix Full timeJob DescriptionJob DescriptionReporting / Analytics Engineer Location: DURHAM , NC- 2 weeks a month office SUMMARY: The client, Fidelity, is initiating a migration project to enhance and update their current reporting infrastructure, aiming to move away from the existing ecosystem of Oracle Business Intelligence Enterprise Edition (OBIEE) and Oracle Exadata...
-
Integration Engineer
6 days ago
Durham, United States Crescens Full timeJob title: Integration Engineer Location: Durham, NC Duration: 12+ months Type: ContractShort Description: • The client requires the services of a Senior Integration Engineer to administer, design, implement, and oversee the integration solutions using the Mule Soft Any point Platform. Job Description: • The Client seeks highly technical resources to...
-
Senior Systems Engineer
2 days ago
Durham, United States Fidelity Investments Full timeProvides system production support using Cloud-based technologies -- Saas solutions for Cloud providers. Coordinates work flows using Continuous Integration and Continuous. Deployment (CI/ CD) pipelines and associated technologies. Scripts in PowerSh Systems Engineer, Information Technology, Systems, Computer Science, Platform Engineer, Senior
-
Senior Network Engineer
26 minutes ago
Durham, North Carolina, United States Motion Recruitment Full timeSenior Network Engineer / Durham, NCAre you passionate about leveraging cutting-edge technology to drive positive change for the environment? Do you thrive in fast-paced environments where every day presents new challenges and opportunities? If so, we have the perfect opportunity for you Our clients leading environmental company is seeking a talented Senior...
-
Durham, NC, United States Nvidia Full timeWe are currently seeking a Senior Developer Technology Engineer, CPU Performance!Would you enjoy researching new algorithms and discovering new techniques to optimize data intensive applications? Do you like investigating hardware and system bottlenecks, and optimizing the performance of critical applications on heterogeneous computing systems with CPUs and...
-
Delivery Lead
4 weeks ago
Durham, United States CareerBuilder Full timeThe People Strategy & Operations (PSO) team is looking for a dynamic and engaging Delivery Lead to drive efficient and effective enablement for transformational solutions specific to AI. Who We Are The People Strategy & Operations team leads P&C in prioritizing, sequencing, and deploying P&C programs and solutions using streamlined processes, systems, and...
-
Senior Cloud Engineer
2 weeks ago
Durham, United States Fidelity TalentSource LLC Full timeSenior Cloud Engineer - Cloud Platforms The Role Do you want to work on leading edge cloud technologies which are transforming how developers work with cloud? As a Senior Cloud Engineer in our Cloud Platforms area, you will work within a diverse team comprised of passionate technologists who believe in the power of innovation and constant collaboration. We...