GPU Infrastructure Management Lead
2 weeks ago
We are seeking an experienced engineering manager to lead our GPU platform team. As a key member of our infrastructure team, you will be responsible for building and scaling one of the largest inference fleets in the world. You will collaborate closely with product and infrastructure teams to help ship reliable products quickly, while ensuring that AI is used responsibly and safely.
Key Challenges:
The successful candidate will face several challenges, including guiding the roadmap for automation for a fleet that can grow an order of magnitude in size or more, building a world-class, secure compute fleet that serves users at scale, and collaborating closely with a broad set of stakeholders. To overcome these challenges, the candidate should have a strong technical background, excellent communication skills, and a passion for leading high-performing teams.
Requirements:
- 10+ years of experience in infrastructure software engineering, including 5+ years of experience in engineering management
- Prior experience building out high performance computing infrastructure teams at scale
- Experience working with provisioning bare metal server data centers that interconnect across a WAN
- Experience building hybrid-cloud platforms
- Care deeply about diversity, equity, and inclusion, and have a track record of building inclusive teams
-
GPU Infrastructure Engineer
4 days ago
San Francisco, California, United States beBee Careers Full timeAbout the RoleThis is an exciting opportunity to join our team as a senior engineer and lead the design, development, and optimization of next-generation virtualized GPU infrastructure. You will collaborate with cross-functional teams to deliver high-quality solutions that meet customer needs.Your primary responsibility will be to guide performance teams on...
-
Senior Infrastructure Manager
2 weeks ago
San Francisco, California, United States OpenAI Full timeAbout the RoleWe are seeking an experienced engineering manager to join our GPU platform team. You will be responsible for building and scaling our large-scale inference fleet, collaborating closely with product and infrastructure teams to deliver reliable products quickly.Key Responsibilities:Develop and implement strategies for scaling our inference...
-
GPU Infrastructure Specialist
3 days ago
San Francisco, California, United States beBee Careers Full timeAs a GPU Infrastructure Specialist, you will be responsible for developing and optimizing software and processes for orchestration of AI workloads over large fleets of distributed GPU hardware.This role involves creating a highly automated infrastructure pipeline for deploying and scaling distributed and multi-tenant GPU-resident compute to new cloud and...
-
GPU Platform Lead
2 weeks ago
San Francisco, California, United States OpenAI Full timeOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. Our team runs the GPU fleet that serves the models backing ChatGPT and the API.We build automation to provision and manage one of the largest cutting-edge GPU inference fleets in the world, exposing it as a singular...
-
GPU Cluster Deployment Manager
7 days ago
San Francisco, California, United States 795b0fc78924510bbd095de6fe06799b Full timeCompany OverviewFluidStack is a cutting-edge organization in the field of AI infrastructure, building and operating GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.The Job DescriptionWe are seeking an experienced Head of Computing Infrastructure to lead deployments...
-
Senior GPU Architect
4 days ago
San Francisco, California, United States beBee Careers Full timeJob Summary:Welcome to the role of Senior GPU Architect, where you will lead the design, development, and optimization of next-generation virtualized GPU infrastructure. As a key member of our team, you will collaborate with customers and stakeholders to define and refine infrastructure requirements for AI/ML workloads. Your expertise in designing scalable,...
-
Engineering Manager, GPU Platform
4 weeks ago
San Francisco, California, United States OpenAI Full timeEngineering Manager, GPU Platform | OpenAIEngineering Manager, GPU PlatformResearch - San FranciscoAbout the TeamOur team runs the GPU fleet that serves the models backing ChatGPT and the API. We build automation to provision and manage one of the largest cutting-edge GPU inference fleets in the world, exposing it as a singular platform for other OpenAI...
-
GPU Optimization Engineer
1 week ago
San Francisco, California, United States Coastal Carbon Full timeRole SummaryWe're seeking an Ai Infrastructure Specialist to help run large-scale experiments, manage infrastructure for foundation models and large machine learning models efficiently on GPUs. The ideal candidate will have experience with scalable training-inference pipelines, strong expertise in distributed computation infrastructure of current-generation...
-
Engineering Manager, GPU Platform
2 weeks ago
San Francisco, California, United States OpenAI Full timeAbout the Team Our team runs the GPU fleet that serves the models backing ChatGPT and the API. We build automation to provision and manage one of the largest cutting edge GPU inference fleets in the world, exposing it as a singular platform for other OpenAI teams to seamlessly run production applied AI workloads. We seek to learn from deployment and...
-
Infrastructure Director
7 days ago
San Francisco, California, United States 795b0fc78924510bbd095de6fe06799b Full timeAbout FluidStackFluidStack is a pioneering organization in the field of AI infrastructure, building and operating GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.The RoleWe are seeking an experienced Infrastructure Director to lead deployments of 10,000+ GPU...