Software Engineer, Agent Infrastructure

3 days ago


San Francisco, United States OpenAI Full time
About the Team

The Agent Infrastructure team at OpenAI designs robust and secure systems that power the training and advanced use cases of next-gen AI models at OpenAI. Our systems are a key component of the infrastructure that makes advanced reasoning models like OpenAI's o1 models possible, and we work hand-in-hand with researchers to train these advanced models.

More advanced reasoning models have the capacity to solve hard problems, and our team is also building the platform on which AI models are able to interact with the world and perform actions. We develop secure environments that allow AI models to write and execute code in secure environments, empowering the next significant milestone in AGI development: The deployment of advanced AI agents.

About the Role

On the Agent Infrastructure team, you will play a crucial role in designing and maintaining robust and secure systems that facilitate the training of next-gen AI models at a massive scale. You will work closely with researchers to enhance system capabilities and support experimental and production workloads. In addition, you will work directly on the systems that form the operating system for AI agents: building systems that allow models to write and execute code, interact with external systems, and take actions in secure environments. We're looking for people with deep experience building AI infrastructure and who are used to working closely with researchers to build high-performance systems at massive scale for specialized use cases.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

In this role, you will:
  • Push massive compute clusters to their limits. You will oversee the development and performance of extremely high-scale Kubernetes clusters, ensuring reliability and high throughput.
  • Use Terraform to stand up and evolve complex infrastructure that powers RL training and advanced model use cases.
  • Collaborate with research teams to stand up and optimize systems for novel AI training runs and experimental applications.
  • Develop and maintain FastAPI and gRPC APIs that serve as the interface for our large-scale AI computing environment, used in both training and inference.
You might thrive in this role if you have:
  • Willingness to leave no stone unturned in pursuit of solving a problem.
  • Experience building backend infrastructure that is easy to maintain.
  • Experience building products that end users interface with.
  • Experience making things go fast (Most things start in Python, but we expect we'll increasingly move towards Rust for more robust performance).
Nice to have:
  • Are a team player, willing to do a variety of tasks that move the team forward.
  • Have experience working on large-scale Machine Learning infrastructure and distributed systems.
  • Know how to reason about training at scale, identifying bottlenecks and engineering solutions to optimize system performance in training environments.
  • Know your way around cloud platforms and work with infrastructure-as-code tech like Terraform.
  • Have mastered more than multiple programming languages and feel comfortable spinning up new services from scratch.


About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

  • San Francisco, CA, United States Anthropic Limited Full time

    About the team: The Agents Infrastructure team at Anthropic is on a mission to build seamless and robust infrastructure and SDKs for LLM-based agents. Our current priorities include developing sandboxed code execution environments and associated tool use SDKs. About Anthropic: Anthropic is an AI safety and research company working to build reliable,...


  • San Francisco, United States OpenAI Full time

    About the Team The Agent Infrastructure team at OpenAI designs robust and secure systems that power the training and advanced use cases of next-gen AI models at OpenAI. Our systems are a key component of the infrastructure that makes advanced reasoning models like OpenAI's o1 models possible, and we work hand-in-hand with researchers to train these advanced...


  • San Francisco, United States jobs.lever.co - ATS Full time

    About Momentum At Momentum, we're on a mission to eliminate mundane, repetitive work with bleeding edge multi-modal AI. Our platform integrates seamlessly at the OS level of any computer, providing a native, next-generation RPA system that’s capable of taking our autonomous agents through the last mile - interacting with apps in ways that would otherwise...


  • San Ramon, United States Dew Software Full time

    Dew Software, a renowned company in the Digital Transformation space, is seeking a skilled Infrastructure Engineer to join their team. With a strong commitment to quality and excellence, Dew Software collaborates with Fortune 500 companies, supporting them in their digital transformation journey. As an Infrastructure Engineer, you will play a crucial role in...


  • San Francisco, California, United States Naptha AI Full time

    About Naptha AIWe are seeking exceptional Software Engineering interns to join Naptha AI and contribute to building the future of AI agent infrastructure.This internship offers hands-on experience working with frontier AI technology, backed by industry veterans and technical leaders through NVIDIA Inception, Google for Startups, and Microsoft for Startups.As...


  • San Francisco, California, United States Orb Full time

    About the RoleWe're seeking a highly skilled Infrastructure Software Engineer to join our team at Orb. As an Infrastructure Software Engineer, you will be responsible for designing and maintaining product features that require a deep understanding of reliable and scalable systems. Our infrastructure team works on every part of the stack and ships product...


  • San Francisco, United States Cleric Full time

    Join us at Cleric We’re building a future where engineers are focused on designing and building products, freeing them from operational toil. We’re starting with an AI-powered SRE agent that diagnosis and remediates issues in production environments. It uses an LLM-based reasoning engine to react to, interpret, and implement solutions to production...


  • San Francisco, United States Nexus Full time

    We are seeking a skilled Senior Software Engineer to join our infrastructure team and help us shape the future of verifiable computing. Leveraging your expertise in Rust, you will contribute to the development of efficient, scalable, and secure systems that support our ambitious goals. About Nexus The Nexus Project is a scientific and engineering effort...


  • San Francisco, United States Orb Full time

    Mission Orb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage-whether that's through seats, consumption, feature limits, or usage-based tiers. Orb brings that opportunity to every software company. We are reimagining...


  • San Francisco, United States Orb Full time

    MissionOrb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage—whether that's through seats, consumption, feature limits, or usage-based tiers. Orb brings that opportunity to every software company.We are reimagining...


  • San Francisco, United States Joinslash Full time

    About Slash Slash is the premier banking platform for small businesses. Our all in one virtual card, bill pay, accounting, and invoicing platform helps entrepreneurs stay on top of their finances, allowing them to spend more time doing what they love. Slash powers hundreds of millions of dollars a year in purchases. Our investors include Y Combinator,...


  • San Francisco, United States Orb Full time

    About Orb Orb is on a mission to revolutionize billing infrastructure for the modern era of AI and software. We empower businesses to align their monetization with product usage-whether through seats, consumption, feature limits, or hybrid pricing models. Our developer-first, data-driven approach enables companies to automate their billing processes and...


  • San Francisco, United States Orb Full time

    About Orb Orb is on a mission to revolutionize billing infrastructure for the modern era of AI and software. We empower businesses to align their monetization with product usage-whether through seats, consumption, feature limits, or hybrid pricing models. Our developer-first, data-driven approach enables companies to automate their billing processes and...


  • San Francisco, United States Nexus Full time

    We are seeking a skilled Senior Software Engineer to join our infrastructure team and help us shape the future of verifiable computing. Leveraging your expertise in Rust, you will contribute to the development of efficient, scalable, and secure systems that support our ambitious goals. About Nexus The Nexus Project is a scientific and engineering effort...


  • San Francisco, United States Nexus Full time

    We are seeking a skilled Senior Software Engineer to join our infrastructure team and help us shape the future of verifiable computing. Leveraging your expertise in Rust, you will contribute to the development of efficient, scalable, and secure systems that support our ambitious goals. About Nexus The Nexus Project is a scientific and engineering effort...


  • San Francisco, United States Nexus Full time

    We are seeking a skilled Senior Software Engineer to join our infrastructure team and help us shape the future of verifiable computing. Leveraging your expertise in Rust, you will contribute to the development of efficient, scalable, and secure systems that support our ambitious goals.About NexusThe Nexus Project is a scientific and engineering effort...


  • San Francisco, California, United States Tbwa ChiatDay Inc Full time

    Infrastructure Software Engineer at WhatnotWe are seeking a highly skilled Infrastructure Software Engineer to join our team at Whatnot. As an expert in scalable systems, you will play a crucial role in ensuring the availability, performance, security, and scalability of our production systems.Your primary responsibilities will include working with the...


  • San Francisco, California, United States Pomelo Full time

    About the RolePomelo is a financial technology platform that combines consumer credit and global remittances. We're looking for a skilled Senior Software Engineer, Infrastructure to join our team in San Francisco. As a vital member of our Infrastructure team, you'll play a key role in building and maintaining the core systems that keep our platform reliable,...


  • San Francisco, United States Instacart Full time

    We're looking for an experienced Software Developer to join our infrastructure team. We're a small team that moves fast. We code mostly in Python and our infrastructure runs on Amazon Web Services. Come join us!ResponsibilitiesYou will create tools for software engineers, data scientists, and data analysts to easily run their production (and staging)...


  • San Francisco, United States Opal Security Full time

    Opal is redefining identity security for modern enterprises. The concept of least privilege access is well understood in theory but very hard in practice. We've all felt the pain of not getting the access we need to do our job - and security teams feel the pain of either being a bottleneck or authorizing everyone at the expense of risk. At Opal, were solving...