GPU Cluster Deployment Manager

7 days ago

San Francisco, California, United States 795b0fc78924510bbd095de6fe06799b Full time

Company Overview

FluidStack is a cutting-edge organization in the field of AI infrastructure, building and operating GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.

The Job Description

We are seeking an experienced Head of Computing Infrastructure to lead deployments of 10,000+ GPU supercomputers globally. As a key member of our team, you will be responsible for leading engagements with OEMs, data centers, ISPs, and all relevant infrastructure partners.

You will own sourcing, procurement, and be responsible for the timely deployment of some of the largest GPU supercomputers in the world. Your expertise will be crucial in building a world-class deployment team to deliver multi-thousand GPU clusters in a matter of days.

Key Responsibilities

Sourcing and procurement of individual components and entire systems
Leading relationships with OEMs, data centers, ISPs, and other infrastructure partners
Designing and building AI clusters combining deep knowledge with customer requirements
Hiring and managing a team of deployment engineers

About You

An ideal candidate has:

3+ years of related experience deploying GPU clusters; 5+ years deploying infrastructure at global scale
Strong relationships with compute and storage OEMs, data centers, ISPs, and others
Experience with InfiniBand or RoCE networking deployments
Exceptional attention to detail and ability to prioritize and deliver in a fast-paced environment

GPU Cluster Deployment Lead

4 days ago

San Jose, California, United States beBee Careers Full time

Engineering Leadership RoleThis is an exceptional opportunity to lead a talented engineering team in delivering innovative software solutions and infrastructure services. The successful candidate will have a proven track record in CI/CD, build automation, and GPU cluster deployment. They will be responsible for developing and implementing engineering...
GPU Cluster Architect

2 weeks ago

San Jose, California, United States Canvendor Full time

**Job Overview:**Canvendor is seeking a skilled GPU Cluster Architect to join our team. As a key member of our hardware engineering group, you will be responsible for designing and developing industry-leading GPU cluster control specifications.**Key Responsibilities:**Develop HW/FW implementation for industry-leading GPU hardware IP.Collaborate with the...
GPU Cluster Resource Scheduler

2 weeks ago

San Francisco, California, United States Jobleads-US Full time

Job DescriptionWe are seeking a talented GPU Cluster Resource Scheduler to join our team. The ideal candidate will have experience in designing and implementing advanced scheduling algorithms, resource management strategies, and optimization techniques to maximize performance and minimize costs for large-scale distributed AI workloads.Key...
GPU Deployment Infrastructure Director Software Development

6 days ago

San Jose, California, United States Advanced Micro Devices, Inc Full time

WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our...
GPU Cluster Control Specialist

3 days ago

San Jose, California, United States beBee Careers Full time

Job Description:We are seeking a talented GPU Cluster Control Specialist to join our team. As a key member of our design team, you will be responsible for developing and implementing cutting-edge GPU cluster control solutions. Your expertise in RTL design and Verilog will enable us to deliver high-performance computing capabilities to various...
GPU Optimization Engineer

1 week ago

San Francisco, California, United States Coastal Carbon Full time

Role SummaryWe're seeking an Ai Infrastructure Specialist to help run large-scale experiments, manage infrastructure for foundation models and large machine learning models efficiently on GPUs. The ideal candidate will have experience with scalable training-inference pipelines, strong expertise in distributed computation infrastructure of current-generation...
Sr. Product Marketing Manager, GPU Clusters

2 weeks ago

San Francisco, California, United States Together AI Full time

Position OverviewWe are seeking a Senior Product Marketing Manager to drive the end-to-end marketing for Together GPU Clusters, a cornerstone of the Together AI Acceleration Cloud. In this role, you will define product positioning and messaging - telling our unique story regarding how Together AI accelerates AI training and inference through applied...
GPU Cluster Support Engineer

3 days ago

San Francisco, California, United States beBee Careers Full time

Job Description:">We are looking for a highly skilled Customer Support Engineer to join our team. As a key member of our support team, you will play a critical role in ensuring the success of our customers by providing timely and effective solutions to complex technical challenges.">Responsibilities:">">Provide technical support to customers on our...
IT InfiniBand/GPU

6 days ago

San Jose, California, United States Cadence Design Systems, Inc. Full time

At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. Cadence is looking for a Sr Staff Systems Engineer who accelerates strategic customer deployments and ensures on-time bring-up and deployment of HPC infrastructure and troubleshooting and supports technical roles supporting HPC, InfiniBand,...
Engineering Manager, Fleet Clusters

8 hours ago

San Francisco, California, United States OpenAI Full time

About the Team Our team runs the GPU fleet that serves the models backing ChatGPT and the API. We build automation to provision and manage one of the largest cutting edge GPU inference fleets in the world, exposing it as a singular platform for other OpenAI teams to seamlessly run production applied AI workloads. We seek to learn from deployment and...

Americas

Europe

Asia / Oceania

Africa

GPU Cluster Deployment Manager