Platform ML Engineering Manager, Inference

4 weeks ago


San Francisco CA, United States OpenAI Full time
About the Team

The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train our cutting-edge models. We work on distributed model execution as well as the interfaces and implementation for model code, training, and inference.

Our priorities are to maximize training throughput (how quickly we can train a new model) and researcher throughput (how quickly we can develop new models) with the goal of accelerating progress towards AGI. We frequently collaborate with other teams to speed up the development of new capabilities.

About the Role

We are looking for an experienced engineering manager to help lead critical work on our shared internal inference stack and grow the team. Our inference stack is primarily built by the Applied AI engineering team and we will improve and extend it for research use cases.

In this role, you will:
  • Get SOTA throughput for our most important research models.
  • Reduce the time it takes to get efficient inference for new model architectures.
  • Collaborate closely with Applied AI engineering to maximize the benefits of our shared internal inference stack.
  • Hire world-class AI systems engineers in one of the most competitive hiring markets.
  • Coordinate the inference needs of OpenAI's research teams.
  • Create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think.
You might thrive in this role if you:
  • Have 3+ years of experience in engineering management and 7+ years as an IC working with high scale distributed systems and ML systems.
  • Have experience with ML systems, particularly high scale distributed training or inference for modern LLMs.
  • Have familiarity with the latest AI research and working knowledge of how these systems are efficiently implemented.
  • Care deeply about diversity, equity, and inclusion, and have a track record of building inclusive teams.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology. #J-18808-Ljbffr

  • Foster City, CA, United States Zoox Full time

    Foster City, CASoftware – Software & Machine Learning Infrastructure /Full-time /On-siteZoox is on a mission to reimagine transportation and ground-up build autonomous robotaxis that are safe, reliable, clean, and enjoyable for everyone. We are still in the early stages of deploying our robotaxis on public roads, and it is a great time to join Zoox and...


  • San Francisco, United States Twelvelabs Full time

    Who we are We’re a fast-moving, diverse team pushing the frontiers of artificial intelligence. At Twelve Labs, our mission is to help developers build programs that can see, listen, and understand the world as we do by bringing the world’s most powerful video understanding infrastructure to market. As a part of achieving this mission, we are building...


  • San Francisco, United States DoorDash Full time

    About the RoleAs a Machine Learning Engineer, you will have the opportunity to leverage our robust data and machine learning infrastructure to develop ML models that impact millions of users across our three audiences and tackle our most challenging business problems. You will work with other engineers, analysts, and product managers to develop and iterate...


  • San Jose, CA, United States Balbix, Inc. Full time

    ENGINEERING Balbix, Inc. in San Jose, CA seeks Director, Platform Engineering & ML Operations: Design & architect solutions for the data engineering pipeline. Work with the Chief Technology Officer to ensure product architecture for new features or enhancements is consistent and correct. $239,200/yr. - $249,200/yr. Email res (must reference Job Code #42375)...


  • San Francisco, CA, United States OpenAI Full time

    About the TeamOur team brings OpenAI’s most capable technology to the world through our products. Most recently, we released ChatGPT, GPT-4, the Whisper API, and DALL-E. We empower consumers and developers alike to use and access our start-of-the-art AI models, allowing them to do things that they’ve never been able to before.Across all product lines, we...


  • San Francisco, California, United States Block Full time

    Job Description About machine learning ML is essential to Block's daily operations and long term success. Its usage has grown dramatically over the past few years and is only accelerating. As more teams integrate ML capabilities, so has the need to avoid duplication by providing shared capabilities. About the team Machine Learning Foundations (MLF)...


  • San Jose, United States Balbix, Inc. Full time

    ENGINEERING Balbix, Inc. in San Jose, CA seeks Director, Platform Engineering & ML Operations: Design & architect solutions for the data engineering pipeline. Work with the Chief Technology Officer to ensure product architecture for new features or enhancements is consistent and correct. $239,200/yr. - $249,200/yr. Email res (must reference Job Code...


  • San Francisco, CA, United States Discord Full time

    Discord is about giving people the power to create space to find belonging in their lives. We want to make it easier for you to talk regularly with the people you care about. We want you to build genuine relationships with your friends and communities close to home or around the world. Original, reliable, playful, and relatable. These are the values that...


  • San Francisco, United States Abnormal Security Full time

    Job DescriptionJob DescriptionAbout the RoleAbnormal Security is looking for a Senior ML Infra Engineer to join the Detection Team. The Detection Division is focused on building the world's most advanced technology for identifying and stopping email and cloud-based attacks that were previously undetectable and help make the world a safer place. As an ML...


  • San Francisco, CA, United States Discord Full time

    Discord is about giving people the power to create space to find belonging in their lives. We want to make it easier for you to talk regularly with the people you care about. We want you to build genuine relationships with your friends and communities close to home or around the world. Original, reliable, playful, and relatable. These are the values that...


  • San Mateo, United States Next Ventures Full time

    A Series A Start-Up, is looking to transform the online shopping experience by offering a platform, where shoppers can engage and transact directly within a conversation thread (i.e. Whats App, FB Messenger, etc).At this time, they are seeking Machine Learning Engineering Manager where you will own the product road map and scale their AI products and...


  • San Mateo, United States Next Ventures Full time

    A Series A Start-Up, is looking to transform the online shopping experience by offering a platform, where shoppers can engage and transact directly within a conversation thread (i.e. Whats App, FB Messenger, etc).At this time, they are seeking Machine Learning Engineering Manager where you will own the product road map and scale their AI products and...


  • San Mateo, United States Next Ventures Full time

    A Series A Start-Up, is looking to transform the online shopping experience by offering a platform, where shoppers can engage and transact directly within a conversation thread (i.e. Whats App, FB Messenger, etc).At this time, they are seeking Machine Learning Engineering Manager where you will own the product road map and scale their AI products and...


  • San Francisco, United States BayOne Solutions Full time

    This is an opportunity for a Machine Learning Engineering Manager to come in and drive Data Science and ML initiatives for the enterprise. Our Client continues to inspire our loyal customers in beauty space and AI/ML is redefining the way we inspire our customers.Some exciting initiatives in action:Generative AI use cases to help our customers discover...


  • San Francisco, United States BayOne Solutions Full time

    This is an opportunity for a Machine Learning Engineering Manager to come in and drive Data Science and ML initiatives for the enterprise. Our Client continues to inspire our loyal customers in beauty space and AI/ML is redefining the way we inspire our customers.Some exciting initiatives in action:Generative AI use cases to help our customers discover...


  • San Francisco, United States BayOne Solutions Full time

    This is an opportunity for a Machine Learning Engineering Manager to come in and drive Data Science and ML initiatives for the enterprise. Our Client continues to inspire our loyal customers in beauty space and AI/ML is redefining the way we inspire our customers.Some exciting initiatives in action:Generative AI use cases to help our customers discover...


  • San Francisco, CA, United States DataBricks Full time

    RDQ225R416At Databricks, we are passionate about enabling data teams to solve the world’s toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world’s best data and AI infrastructure platform so our customers can use deep data insights...


  • San Francisco, California, United States Ritual Full time

    About Ritual Ritual is the network for open AI infrastructure. We build groundbreaking, new architecture on a crowdsourced governance layer aimed to handle safety, funding, alignment, and model maintenance.Join us on the journey to decentralize AIAbout the roleWe are looking for a talented and motivated Machine Learning Product Engineer to join our team. In...


  • San Francisco, United States Zep AI Full time

    Zep is building the long-term memory layer for the LLM application stack. We have a large and active open-source community and recently launched our cloud service. We are seeking an experienced ML Engineer to join our startup. As a critical member of our small, high-performance team, you will be responsible for model selection, evaluation, and performance,...


  • San Francisco, United States Ritual Full time

    About Ritual Ritual is the network for open AI infrastructure. We build groundbreaking, new architecture on a crowdsourced governance layer aimed to handle safety, funding, alignment, and model maintenance.Join us on the journey to decentralize AI!About the roleWe are looking for a talented and motivated Machine Learning Product Engineer to join our team....