GPU Cluster Deployment Manager

7 days ago


San Francisco, California, United States 795b0fc78924510bbd095de6fe06799b Full time
Company Overview

FluidStack is a cutting-edge organization in the field of AI infrastructure, building and operating GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.

The Job Description

We are seeking an experienced Head of Computing Infrastructure to lead deployments of 10,000+ GPU supercomputers globally. As a key member of our team, you will be responsible for leading engagements with OEMs, data centers, ISPs, and all relevant infrastructure partners.

You will own sourcing, procurement, and be responsible for the timely deployment of some of the largest GPU supercomputers in the world. Your expertise will be crucial in building a world-class deployment team to deliver multi-thousand GPU clusters in a matter of days.

Key Responsibilities
  • Sourcing and procurement of individual components and entire systems
  • Leading relationships with OEMs, data centers, ISPs, and other infrastructure partners
  • Designing and building AI clusters combining deep knowledge with customer requirements
  • Hiring and managing a team of deployment engineers
About You

An ideal candidate has:

  • 3+ years of related experience deploying GPU clusters; 5+ years deploying infrastructure at global scale
  • Strong relationships with compute and storage OEMs, data centers, ISPs, and others
  • Experience with InfiniBand or RoCE networking deployments
  • Exceptional attention to detail and ability to prioritize and deliver in a fast-paced environment


  • San Jose, California, United States beBee Careers Full time

    Engineering Leadership RoleThis is an exceptional opportunity to lead a talented engineering team in delivering innovative software solutions and infrastructure services. The successful candidate will have a proven track record in CI/CD, build automation, and GPU cluster deployment. They will be responsible for developing and implementing engineering...

  • GPU Cluster Architect

    2 weeks ago


    San Jose, California, United States Canvendor Full time

    **Job Overview:**Canvendor is seeking a skilled GPU Cluster Architect to join our team. As a key member of our hardware engineering group, you will be responsible for designing and developing industry-leading GPU cluster control specifications.**Key Responsibilities:**Develop HW/FW implementation for industry-leading GPU hardware IP.Collaborate with the...


  • San Francisco, California, United States Jobleads-US Full time

    Job DescriptionWe are seeking a talented GPU Cluster Resource Scheduler to join our team. The ideal candidate will have experience in designing and implementing advanced scheduling algorithms, resource management strategies, and optimization techniques to maximize performance and minimize costs for large-scale distributed AI workloads.Key...


  • San Jose, California, United States Advanced Micro Devices, Inc Full time

    WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our...


  • San Jose, California, United States beBee Careers Full time

    Job Description:We are seeking a talented GPU Cluster Control Specialist to join our team. As a key member of our design team, you will be responsible for developing and implementing cutting-edge GPU cluster control solutions. Your expertise in RTL design and Verilog will enable us to deliver high-performance computing capabilities to various...


  • San Francisco, California, United States Coastal Carbon Full time

    Role SummaryWe're seeking an Ai Infrastructure Specialist to help run large-scale experiments, manage infrastructure for foundation models and large machine learning models efficiently on GPUs. The ideal candidate will have experience with scalable training-inference pipelines, strong expertise in distributed computation infrastructure of current-generation...


  • San Francisco, California, United States Together AI Full time

    Position OverviewWe are seeking a Senior Product Marketing Manager to drive the end-to-end marketing for Together GPU Clusters, a cornerstone of the Together AI Acceleration Cloud. In this role, you will define product positioning and messaging - telling our unique story regarding how Together AI accelerates AI training and inference through applied...


  • San Francisco, California, United States beBee Careers Full time

    Job Description:">We are looking for a highly skilled Customer Support Engineer to join our team. As a key member of our support team, you will play a critical role in ensuring the success of our customers by providing timely and effective solutions to complex technical challenges.">Responsibilities:">">Provide technical support to customers on our...

  • IT InfiniBand/GPU

    6 days ago


    San Jose, California, United States Cadence Design Systems, Inc. Full time

    At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. Cadence is looking for a Sr Staff Systems Engineer who accelerates strategic customer deployments and ensures on-time bring-up and deployment of HPC infrastructure and troubleshooting and supports technical roles supporting HPC, InfiniBand,...


  • San Francisco, California, United States OpenAI Full time

    About the Team Our team runs the GPU fleet that serves the models backing ChatGPT and the API. We build automation to provision and manage one of the largest cutting edge GPU inference fleets in the world, exposing it as a singular platform for other OpenAI teams to seamlessly run production applied AI workloads.  We seek to learn from deployment and...