Software Engineer, Fleet Management

6 days ago


San Francisco, United States OpenAI Full time

The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI’s models to operate seamlessly at scale, supporting both internal research and external products like ChatGPT. We prioritize safety, reliability, and responsible AI deployment over unchecked growth.About the RoleThe Software Engineer, Operating Systems & Orchestration will focus on building systems to manage hardware, configurations, vendors, and the people interacting with our infrastructure. You will design and develop solutions that integrate individual nodes and servers into unified clusters, directly contributing to advancing AI research by streamlining the overall research user experience. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.In this role, you will:Design and build systems to manage both cloud and bare-metal fleets at scale.Develop tools that integrate low-level hardware metrics with high-level job scheduling and cluster management algorithms.Leverage LLMs to coordinate vendor operations and optimize infrastructure workflows.Automate infrastructure processes, reducing repetitive toil and improving system reliability.Collaborate with hardware, infrastructure, and research teams to ensure seamless integration across the stack.Continuously improve tools, automation, processes, and documentation to enhance operational efficiency.You might thrive in this role if you:Have strong software engineering skills with experience in large-scale infrastructure environments.Possess broad knowledge of cluster-level systems (e.g., Kubernetes, CI/CD pipelines, Terraform, cloud providers).Have deep expertise in server-level systems (e.g., systemd, containerization, Chef, Linux kernels, firmware management, host routing).Are passionate about optimizing the performance and reliability of large compute fleets.Thrive in dynamic environments and are eager to solve complex infrastructure challenges.Value automation, efficiency, and continuous improvement in everything you build.



  • San Francisco, United States OpenAI Full time

    Software Engineer, Fleet Management Join to apply for the Software Engineer, Fleet Management role at OpenAI. The Fleet team at OpenAI supports the computing environment that powers our cutting‑edge research and product development. We oversee large‑scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance,...

  • Data Engineer

    3 days ago


    San Mateo, United States Vertex Sigma Software Full time

    Data Engineer - Ride and Fleet Software **Hybrid (3 days onsite, 2 days remote)** We are building autonomous mobility from the ground up. The Ride and Fleet software teams connect our users to the rest of our infrastructure. They are responsible for delivering the autonomous ride service. They work to deliver a seamless, safe and reliable best-in-class ride...


  • San Francisco, United States OpenAI Full time

    The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...


  • San Francisco, United States OpenAI Full time

    The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...


  • San Francisco, CA, United States Openai Full time

    The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...


  • San Francisco, United States OpenAI Full time

    The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...


  • San Francisco, California, United States Hayden AI Full time

    About UsAt Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems and other government agencies address real-world challenges.From bus lane and bus stop enforcement to transportation optimization technologies and beyond, our innovative mobile perception system empowers our clients to accelerate transit,...


  • San Francisco, CA, United States OpenAI Full time

    The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...


  • San Francisco, CA, United States Openai Full time

    The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...


  • San Francisco, CA, United States OpenAI Full time

    The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...