Software Engineer, Fleet Management
6 days ago
The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI’s models to operate seamlessly at scale, supporting both internal research and external products like ChatGPT. We prioritize safety, reliability, and responsible AI deployment over unchecked growth.About the RoleThe Software Engineer, Operating Systems & Orchestration will focus on building systems to manage hardware, configurations, vendors, and the people interacting with our infrastructure. You will design and develop solutions that integrate individual nodes and servers into unified clusters, directly contributing to advancing AI research by streamlining the overall research user experience. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.In this role, you will:Design and build systems to manage both cloud and bare-metal fleets at scale.Develop tools that integrate low-level hardware metrics with high-level job scheduling and cluster management algorithms.Leverage LLMs to coordinate vendor operations and optimize infrastructure workflows.Automate infrastructure processes, reducing repetitive toil and improving system reliability.Collaborate with hardware, infrastructure, and research teams to ensure seamless integration across the stack.Continuously improve tools, automation, processes, and documentation to enhance operational efficiency.You might thrive in this role if you:Have strong software engineering skills with experience in large-scale infrastructure environments.Possess broad knowledge of cluster-level systems (e.g., Kubernetes, CI/CD pipelines, Terraform, cloud providers).Have deep expertise in server-level systems (e.g., systemd, containerization, Chef, Linux kernels, firmware management, host routing).Are passionate about optimizing the performance and reliability of large compute fleets.Thrive in dynamic environments and are eager to solve complex infrastructure challenges.Value automation, efficiency, and continuous improvement in everything you build.
-
Software Engineer, Fleet Management
5 days ago
San Francisco, United States OpenAI Full timeSoftware Engineer, Fleet Management Join to apply for the Software Engineer, Fleet Management role at OpenAI. The Fleet team at OpenAI supports the computing environment that powers our cutting‑edge research and product development. We oversee large‑scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance,...
-
Data Engineer
3 days ago
San Mateo, United States Vertex Sigma Software Full timeData Engineer - Ride and Fleet Software **Hybrid (3 days onsite, 2 days remote)** We are building autonomous mobility from the ground up. The Ride and Fleet software teams connect our users to the rest of our infrastructure. They are responsible for delivering the autonomous ride service. They work to deliver a seamless, safe and reliable best-in-class ride...
-
Software Engineer, Fleet Management
2 weeks ago
San Francisco, United States OpenAI Full timeThe Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...
-
Software Engineer, Fleet Management
4 weeks ago
San Francisco, United States OpenAI Full timeThe Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...
-
Software Engineer, Fleet Management
7 hours ago
San Francisco, CA, United States Openai Full timeThe Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...
-
Software Engineer, Fleet Management
4 weeks ago
San Francisco, United States OpenAI Full timeThe Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...
-
Senior Software Engineer, Fleet
1 week ago
San Francisco, California, United States Hayden AI Full timeAbout UsAt Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems and other government agencies address real-world challenges.From bus lane and bus stop enforcement to transportation optimization technologies and beyond, our innovative mobile perception system empowers our clients to accelerate transit,...
-
Software Engineer, Fleet Management
7 days ago
San Francisco, CA, United States OpenAI Full timeThe Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...
-
Software Engineer, Fleet Management
2 weeks ago
San Francisco, CA, United States Openai Full timeThe Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...
-
Software Engineer, Fleet Management
2 weeks ago
San Francisco, CA, United States OpenAI Full timeThe Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI's models to operate seamlessly at scale, supporting both internal...