Senior Infrastructure Engineer, GPU Platform

1 day ago


San Francisco, California, United States OpenAI Full time
About the Team

The Applied Engineering team at OpenAI is a collaborative group that works across research, engineering, product, and design to bring cutting-edge AI technology to consumers and businesses. Our team is responsible for running the infrastructure that supports the models backing ChatGPT and the API, including inference kubernetes clusters, GPU health, Infiniband performance, node lifecycle, and more.

About the Role

The inference compute team at OpenAI builds and maintains infrastructure abstractions that enable the company to run models at scale. As a key member of this team, you will design and build the inference infrastructure that powers our products, ensuring reliability and performance. You will also be responsible for ensuring our infrastructure can scale to the next order of magnitude, helping create a diverse, equitable, and inclusive culture, and participating in an on-call rotation to respond to critical incidents as needed.

Responsibilities
  • Design and build scalable, reliable, and secure infrastructure abstractions that enable the company to run models at scale.
  • Ensure our infrastructure can scale to the next order of magnitude, meeting the demands of a rapidly growing business.
  • Collaborate with cross-functional teams to create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think.
  • Participate in an on-call rotation to respond to critical incidents as needed, ensuring the reliability of the systems we build.
Requirements
  • 10+ years of experience building core infrastructure.
  • Experience running GPU clusters at scale.
  • Experience operating orchestration systems such as Kubernetes at scale.
  • Take pride in building and operating scalable, reliable, secure systems.
  • Are comfortable with ambiguity and rapid change.


  • San Francisco, California, United States OpenAI Full time

    About the TeamThe Applied Engineering team at OpenAI is a collaborative group that works across research, engineering, product, and design to bring the company's technology to consumers and businesses. This team is responsible for running the infrastructure that supports the models backing ChatGPT and the API, including inference kubernetes clusters, GPU...


  • San Francisco, California, United States OpenAI Full time

    About the TeamThe Applied Engineering team at OpenAI is a collaborative group that works across research, engineering, product, and design to bring the company's technology to consumers and businesses.You'll be part of the team responsible for running the infrastructure that supports the models backing ChatGPT and the API. Our systems include inference...


  • San Francisco, California, United States OpenAI Full time

    About the TeamThe Applied Engineering team is a dynamic group that works across research, engineering, product, and design to bring OpenAI's technology to consumers and businesses.You'll join the team responsible for running the infrastructure that supports the models backing ChatGPT and the API. The systems we support include inference kubernetes clusters,...


  • San Francisco, California, United States Scale AI, Inc. Full time

    About the RoleWe are seeking a highly skilled Cloud Infrastructure Engineer to join our Platform Engineering team at Scale AI, Inc. As a key member of our team, you will be responsible for designing and developing core cloud infrastructure platforms and systems, while supporting orchestration, data abstraction, data pipelines, identity & access management,...

  • Senior GPU Engineer

    1 week ago


    San Francisco, California, United States Succinct Full time

    About SuccinctSuccinct is a pioneering company in the field of zero-knowledge proofs, dedicated to making this complex technology accessible to developers. Our mission is to empower developers to build scalable, interoperable, and private blockchain solutions.The RoleWe are seeking a Senior GPU Engineer to join our team and contribute to the development of...


  • San Francisco, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Compute Infrastructure team at Apple. As a key member of our team, you will be responsible for establishing cross-functional partnerships with internal stakeholders to drive the development and deployment of our machine learning platforms.Key...


  • San Jose, California, United States Samsung Electronics Full time

    Job Title: Senior GPU Performance EngineerAt Samsung Electronics, we are seeking a highly skilled Senior GPU Performance Engineer to join our team. As a key member of our Xclipse GPU software team, you will be responsible for delivering cutting-edge technologies to revolutionize the mobile GPU market.Key Responsibilities:Optimize and fine-tune the...


  • San Francisco, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform & Technology (MLPT) team at Apple. As a key member of our team, you will be responsible for driving and influencing our compute roadmap for improving engineering efficiencies, reducing cost, and ensuring resiliency for Apple's ML use...


  • San Francisco, California, United States Apple Inc. Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform & Technology (MLPT) team at Apple Inc. This is a unique opportunity to lead and drive the development of our next-generation Machine Learning models and infrastructure.Key ResponsibilitiesPartner with engineering teams and product/program...


  • San Francisco, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform and Technology (MLPT) team at Apple. As a key member of our team, you will be responsible for driving the development and implementation of our Machine Learning Compute Platform, which provides services to all internal Apple developers...


  • San Francisco, California, United States Apple Inc. Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform & Technology (MLPT) team at Apple. As a key member of our team, you will be responsible for driving the development and implementation of our ML Compute Platform, which provides services to all internal Apple developers focused on providing...


  • San Francisco, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform & Technology (MLPT) team at Apple. As a key member of our team, you will be responsible for establishing cross-functional partnerships with all of Apple's ML partners to improve the ease of use of our compute services.Key...


  • San Francisco, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform & Technology (MLPT) team at Apple. As a key member of our team, you will be responsible for establishing cross-functional partnerships with all of Apple's ML partners to improve the ease of use of our compute services.Key...


  • San Francisco, California, United States Apple Inc. Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our AIML team at Apple Inc. in the Machine Learning Platform & Technology (MLPT) organization. As a key member of our team, you will be responsible for establishing cross-functional partnerships with all of Apple's ML partners to understand their use cases and improve...


  • San Jose, California, United States Samsung Electronics Full time

    Job SummarySamsung Electronics is seeking a highly skilled Senior GPU Performance Engineer to join our Xclipse GPU software team. As a key member of our team, you will be responsible for developing and optimizing GPU IP from architectural planning to productization, ensuring the highest level of performance and efficiency in our products.Key...


  • San Francisco, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform & Technology (MLPT) team at Apple. As a key member of our team, you will be responsible for driving the development and implementation of our ML Compute Platform, which provides services to all internal Apple developers focused on providing...


  • San Francisco, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform & Technology (MLPT) team at Apple. As a key member of our team, you will be responsible for establishing cross-functional partnerships with all of Apple's ML partners to understand their use cases and improve the ease of use of our compute...


  • San Francisco, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform & Technology (MLPT) team at Apple. As a key member of our team, you will be responsible for establishing cross-functional partnerships with all of Apple's ML partners, understanding their use cases, and improving the ease of use of our...


  • San Francisco, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform and Technology (MLPT) team at Apple. As a key member of our team, you will be responsible for driving the development and implementation of our Machine Learning Compute Platform, which provides services to all internal Apple developers...


  • San Francisco, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Engineering Program Manager to join our Machine Learning Platform & Technology (MLPT) team at Apple. As a key member of our team, you will be responsible for establishing cross-functional partnerships with all of Apple's ML partners to understand their use cases and improve the ease of use of our compute...