GPU Cluster Deployment Manager
7 days ago
FluidStack is a cutting-edge organization in the field of AI infrastructure, building and operating GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.
The Job DescriptionWe are seeking an experienced Head of Computing Infrastructure to lead deployments of 10,000+ GPU supercomputers globally. As a key member of our team, you will be responsible for leading engagements with OEMs, data centers, ISPs, and all relevant infrastructure partners.
You will own sourcing, procurement, and be responsible for the timely deployment of some of the largest GPU supercomputers in the world. Your expertise will be crucial in building a world-class deployment team to deliver multi-thousand GPU clusters in a matter of days.
Key Responsibilities- Sourcing and procurement of individual components and entire systems
- Leading relationships with OEMs, data centers, ISPs, and other infrastructure partners
- Designing and building AI clusters combining deep knowledge with customer requirements
- Hiring and managing a team of deployment engineers
An ideal candidate has:
- 3+ years of related experience deploying GPU clusters; 5+ years deploying infrastructure at global scale
- Strong relationships with compute and storage OEMs, data centers, ISPs, and others
- Experience with InfiniBand or RoCE networking deployments
- Exceptional attention to detail and ability to prioritize and deliver in a fast-paced environment
-
GPU Cluster Deployment Lead
4 days ago
San Jose, California, United States beBee Careers Full timeEngineering Leadership RoleThis is an exceptional opportunity to lead a talented engineering team in delivering innovative software solutions and infrastructure services. The successful candidate will have a proven track record in CI/CD, build automation, and GPU cluster deployment. They will be responsible for developing and implementing engineering...
-
GPU Cluster Architect
2 weeks ago
San Jose, California, United States Canvendor Full time**Job Overview:**Canvendor is seeking a skilled GPU Cluster Architect to join our team. As a key member of our hardware engineering group, you will be responsible for designing and developing industry-leading GPU cluster control specifications.**Key Responsibilities:**Develop HW/FW implementation for industry-leading GPU hardware IP.Collaborate with the...
-
GPU Cluster Resource Scheduler
2 weeks ago
San Francisco, California, United States Jobleads-US Full timeJob DescriptionWe are seeking a talented GPU Cluster Resource Scheduler to join our team. The ideal candidate will have experience in designing and implementing advanced scheduling algorithms, resource management strategies, and optimization techniques to maximize performance and minimize costs for large-scale distributed AI workloads.Key...
-
San Jose, California, United States Advanced Micro Devices, Inc Full timeWHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our...
-
GPU Cluster Control Specialist
3 days ago
San Jose, California, United States beBee Careers Full timeJob Description:We are seeking a talented GPU Cluster Control Specialist to join our team. As a key member of our design team, you will be responsible for developing and implementing cutting-edge GPU cluster control solutions. Your expertise in RTL design and Verilog will enable us to deliver high-performance computing capabilities to various...
-
GPU Optimization Engineer
1 week ago
San Francisco, California, United States Coastal Carbon Full timeRole SummaryWe're seeking an Ai Infrastructure Specialist to help run large-scale experiments, manage infrastructure for foundation models and large machine learning models efficiently on GPUs. The ideal candidate will have experience with scalable training-inference pipelines, strong expertise in distributed computation infrastructure of current-generation...
-
Sr. Product Marketing Manager, GPU Clusters
2 weeks ago
San Francisco, California, United States Together AI Full timePosition OverviewWe are seeking a Senior Product Marketing Manager to drive the end-to-end marketing for Together GPU Clusters, a cornerstone of the Together AI Acceleration Cloud. In this role, you will define product positioning and messaging - telling our unique story regarding how Together AI accelerates AI training and inference through applied...
-
GPU Cluster Support Engineer
3 days ago
San Francisco, California, United States beBee Careers Full timeJob Description:">We are looking for a highly skilled Customer Support Engineer to join our team. As a key member of our support team, you will play a critical role in ensuring the success of our customers by providing timely and effective solutions to complex technical challenges.">Responsibilities:">">Provide technical support to customers on our...
-
IT InfiniBand/GPU
6 days ago
San Jose, California, United States Cadence Design Systems, Inc. Full timeAt Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. Cadence is looking for a Sr Staff Systems Engineer who accelerates strategic customer deployments and ensures on-time bring-up and deployment of HPC infrastructure and troubleshooting and supports technical roles supporting HPC, InfiniBand,...
-
Engineering Manager, Fleet Clusters
8 hours ago
San Francisco, California, United States OpenAI Full timeAbout the Team Our team runs the GPU fleet that serves the models backing ChatGPT and the API. We build automation to provision and manage one of the largest cutting edge GPU inference fleets in the world, exposing it as a singular platform for other OpenAI teams to seamlessly run production applied AI workloads. We seek to learn from deployment and...