Engineering Manager, Applied GPU Platform

1 week ago


San Francisco, United States OpenAI Full time

Our team runs the GPU fleet that serves the models backing ChatGPT and the API. We build automation to provision and manage one of the largest cutting edge GPU inference fleets in the world, exposing it as a singular platform for other OpenAI teams to seamlessly run production applied AI workloads.

We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth.

About the Role

We are looking for an experienced engineering manager to help lead our GPU platform team. You’ll help build and scale one of the largest inference fleets in the world. You will also collaborate closely with product and infrastructure teams to help ship reliable products quickly.

In this role, you will:

  1. Manage and build a diverse team of high performing infrastructure engineers
  2. Guide the roadmap for automation for a fleet that can grow an order of magnitude in size or more
  3. Build a world-class, secure compute fleet that serves users at scale
  4. Set technical direction on evolving our compute and abstractions to support a growing business
  5. Collaborate closely with a broad set of stakeholders, including product engineering, inference, security, research and finance
  6. Work with external partners to unlock bleeding edge compute and make it available as a turnkey resource for scheduling workloads
  7. Coach and nurture engineers to accelerate their growth and learning

You might thrive in this role if you:

  1. Have 10+ years of experience in infrastructure software engineering, including 5+ years of experience in engineering management
  2. Have prior experience building out high performance computing infrastructure teams at scale
  3. Have worked with provisioning bare metal server data centers that interconnect across a WAN
  4. Have experience building hybrid-cloud platforms
  5. Care deeply about diversity, equity, and inclusion, and have a track record of building inclusive teams
  6. Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done. You will be expected to be able to be hands-on to help the team debug issues or manage systems from time to time as needed.
  7. Have the ability to move fast in an environment where things are sometimes loosely defined and may have competing priorities or deadlines
About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

#J-18808-Ljbffr

  • San Francisco, United States OpenAI Full time

    About the Team The Applied Engineering team works across research, engineering, product, and design to bring OpenAI's technology to consumers and businesses. You'll join the team responsible for running the infrastructure that supports the models backing ChatGPT and the API. The systems we support include inference kubernetes clusters, GPU health, Infiniband...


  • San Francisco, United States OpenAI Full time

    About the Team The Applied Engineering team works across research, engineering, product, and design to bring OpenAI's technology to consumers and businesses. You'll join the team responsible for running the infrastructure that supports the models backing ChatGPT and the API. The systems we support include inference kubernetes clusters, GPU health, Infiniband...


  • San Jose, United States Oho Group Ltd Full time

    We are working with a leading innovator in smart electric vehicles who are seeking GPU Virtualization Engineers. The company specializes in autonomous driving, digital systems, electric powertrains, and batteries. Notable advancements include battery swapping technology, Battery as a Service (BaaS), and Autonomous Driving as a Service (ADaaS). Its diverse...


  • San Jose, United States Oho Group Ltd Full time

    An industry leading smart electric vehicle company is looking for a Virtualization Engineer that specialises within GPU.Their focus areas include designing, developing, co-manufacturing, and selling high-end smart electric vehicles. They specialize within autonomous driving, digital technologies, electric powertrains, and battery systems.Roles and...


  • san jose, United States Oho Group Ltd Full time

    An industry leading smart electric vehicle company is looking for a Virtualization Engineer that specialises within GPU.Their focus areas include designing, developing, co-manufacturing, and selling high-end smart electric vehicles. They specialize within autonomous driving, digital technologies, electric powertrains, and battery systems.Roles and...

  • GPU Design Engineer

    4 weeks ago


    San Diego, California, United States MediaTek Full time

    Job Title: GPU Design Verification EngineerCompany: MediaTekAt MediaTek, we are seeking a skilled GPU Design Verification Engineer to join our team. As a leading fabless semiconductor company, we empower innovation and inspire people to expand their horizons through smart technology.Responsibilities:Develop a deep understanding of GPU specs, including 3D...

  • GPU Modeling Engineer

    6 months ago


    San Jose, United States SAMSUNG Full time

    Position Summary Samsung, a world leader in advanced semiconductor technology, is founded on a simple philosophy – the endless pursuit of excellence will create a better world for all. At Samsung Austin Research and Development Center (SARC) and Advanced Computing Lab (ACL), we are building a center of excellence for Intellectual Property (IP) that is...


  • san jose, United States Oho Group Ltd Full time

    We are working with a leading innovator in smart electric vehicles who are seeking GPU Virtualization Engineers. The company specializes in autonomous driving, digital systems, electric powertrains, and batteries. Notable advancements include battery swapping technology, Battery as a Service (BaaS), and Autonomous Driving as a Service (ADaaS). Its diverse...

  • Platform Engineer

    3 weeks ago


    San Francisco, United States Voltage Park Inc. Full time

    About Voltage Park On-DemandVoltage Park’s mission is to make AI infrastructure accessible to all. Today, we own 24,000+ H100s and operate 7+ data-centers across the US. We serve customers of all sizes, from small research labs to large enterprises. We’re in search of a Platform Engineer to join our On-Demand team, where you’ll help us build a platform...

  • Software Engineer

    4 months ago


    San Francisco, United States CentML Full time

    About Us We believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential. Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at...

  • Software Engineer

    2 days ago


    San Francisco, United States ZipRecruiter Full time

    Job DescriptionMagic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific...


  • San Jose, United States Advanced Micro Devices , Inc. Full time

    WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our...


  • San Francisco, United States CentML Full time

    About UsWe believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential.Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at...

  • Technical Lead

    4 days ago


    San Diego, United States Samsung Electronics GmbH Full time

    Technical Lead - GPU Design Verification EngineerJob Location: 3900 N Capital of Texas Hwy, Austin, TX, USA9808 Scranton Rd, San Diego, CA, USA3655 N 1st St, San Jose, CA, USAPost Time: Posted 30+ Days AgoJob #: R94664Position SummarySamsung, a world leader in advanced semiconductor technology, is founded on a simple philosophy – the endless pursuit of...


  • San Jose, United States Advanced Micro Devices , Inc. Full time

    WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our...

  • Senior GPU Architect

    4 weeks ago


    San Diego, California, United States MediaTek Full time

    Job Title: Senior GPU ArchitectDescription: We are seeking an experienced Senior GPU Architect to join our team at MediaTek. The successful candidate will be responsible for designing and optimizing GPU clusters for industry-leading GPU hardware IP. They will collaborate with our Architecture and Software teams to develop cluster-level HW specifications that...


  • San Diego, California, United States Qualcomm Full time

    Job Title: Graphics Software Kernel Mode EngineerCompany: Qualcomm Technologies, Inc.Job Area: Engineering Group, Engineering Group > Graphics Software EngineeringSummary: Qualcomm is a leading technology innovator pushing the boundaries of what's possible to enable next-generation gaming, XR, and AI experiences. As a Graphics Software Engineer - Kernel...


  • San Diego, United States Advanced Micro Devices, Inc Full time

    WHAT YOU DO AT AMD CHANGES EVERYTHINGWe care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our...


  • San Francisco, United States Voltage Park Inc. Full time

    Voltage Park is on a mission to make machine learning infrastructure accessible to all, from large enterprises and research universities to seed-stage startups and nonprofits. We operate a massive fleet of 24,000 fully-owned NVIDIA GPUs colocated across four top-tier data centers, and we are the only cloud provider offering a platform that shows all...

  • Platform Engineer

    3 weeks ago


    San Francisco, United States Factory Full time

    Factory is seeking talented platform engineers to help strengthen our core platform, with a focus on platform services and enhancing developer experience.What you will do and achieve:Play a leading role in the design, development, and optimization of the Factory platform and all of it’s core services and internal systems.Drive key areas, including search...