Software Engineer, Distributed Systems

1 month ago


San Francisco, United States OpenAI Full time

About the Team

The Platform Runtime team builds the low-level framework components to power our ML training systems. We work on building robust, scalable, high-performance components to support our distributed training workloads. Our priorities are to maximize the productivity of our researchers and our hardware, with the goal of accelerating progress towards AGI.

About the Role

As a Distributed Systems engineer, you will work to deliver powerful APIs orchestrating thousands of computers moving and persisting vast amounts of data. This requires both providing easy-to-use, introspectable systems that can promote a fast debugging and development cycle, while also enabling that experience to scale to our newest supercomputers maintaining stability and performance throughout.

We’re looking for people who love optimizing an end-to-end system, understanding high-performance I/O to maximize local performance and distributed across our supercomputers. We want someone excited by the rapid pace of responding to the dynamic and evolving needs of our training systems architectures.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

In this role, you will:

  1. Work across our Python and Rust stack
  2. Profile, optimize, and help design for scale our compute and data capabilities
  3. Work on deploying our training framework to our latest supercomputers, rapidly responding to the changing shapes and needs of the ML systems.

You might thrive in this role if you:

  1. Have worked on large distributed systems
  2. Love figuring out how systems work and continuously come up with ideas for how to make them faster while minimizing complexity and maintenance burden
  3. Have strong software engineering skills and are proficient in Python and Rust or equivalent.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability, or any other legally protected status.

#J-18808-Ljbffr

  • San Francisco, California, United States USM Business Systems Full time

    Job Title: Senior Software Engineer - Distributed SystemsWe are seeking a highly skilled Senior Software Engineer to join our team in San Francisco, CA. As a key member of our development team, you will be responsible for designing and implementing scalable distributed systems using Java, Kafka, Cassandra, and Spring.About the Role:Develop high-performance,...

  • Software Engineer

    4 weeks ago


    San Francisco, United States High-Tech Professionals Full time

    Software Engineer - Distributed Systems Job ID: 1782 Location: San Francisco Bay Area Type: Permanent Status: Closed Key Skills: Distributed, parallel system software, C, C++, UNIX, storage architecture, cluster, database, storage IO data, full stack engineering, system development. Description: Seeking Software Engineer to design and build distributed...


  • San Francisco, California, United States Nextdoor Full time

    Job DescriptionWe are seeking a skilled Software Engineer to join our Core Services team at Nextdoor, responsible for operating critical high-throughput services that power communities worldwide.As a member of this team, you will work in a large-scale distributed system environment, identifying opportunities to increase performance, scalability, and...

  • Software Engineer

    2 weeks ago


    San Francisco, California, United States Gopowerev Full time

    Overview:GopowerEV is revolutionizing the EV charging industry with innovative solutions for multi-family properties.Job Description:We are seeking a seasoned Backend Software Engineer to join our team and help design and implement robust backend systems for our EV charging solutions.Key Responsibilities:Design and implement scalable, distributed systems and...

  • Software Engineer

    4 weeks ago


    San Francisco, California, United States MongoDB Full time

    About MongoDBMongoDB empowers innovators to build a better world by unleashing the power of software and data. Our industry-leading developer data platform, MongoDB Atlas, is the only globally distributed, multi-cloud database available in over 115 regions across major cloud providers.Our team is building cloud-based distributed systems software responsible...


  • San Francisco, California, United States OpenAI Full time

    About the RoleWe are seeking a skilled Distributed Systems engineer to join our team. As a key member, you will be responsible for designing and implementing powerful APIs that orchestrate thousands of computers and manage vast amounts of data.This requires a deep understanding of high-performance I/O and the ability to optimize end-to-end systems for...


  • San Francisco, California, United States Cisco Full time

    OverviewCisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network. Our goal is to equip our customers with complete visibility into end-user connectivity, wherever they may be located.About the RoleThis Senior Software Engineer will be working in the Endpoint team,...


  • San Francisco, California, United States Discord Full time

    About the RoleAs a Staff Software Engineer at Discord, you will play a key role in building and maintaining our real-time features and services. With over 200 million active users per month, we are looking for someone who can help us scale our systems to meet the demands of our growing user base.With a strong understanding of distributed systems, you will be...


  • San Francisco, United States Openai Full time

    About the Team The Platform Runtime team builds the low-level framework components to power our ML training systems. We work on building robust, scalable, high-performance components to support our distributed training workloads. Our priorities are to maximize the productivity of our researchers and our hardware, with the goal of accelerating progress...


  • San Francisco, California, United States Eventual Computing Full time

    At Eventual Computing, we are building a cutting-edge data platform to help data scientists and engineers build data applications. As a Senior Software Engineer - Distributed Systems, you will play a key role in designing and implementing our distributed data engine Daft, which runs on 800k CPU cores daily.The ideal candidate has a strong foundation in...


  • San Francisco, United States Mixpanel Full time

    We are actively recruiting for multiple Software Engineers across different levels for our org! About the Role Mixpanel is powered by a custom distributed database. This system ingests more than 1 Trillion user-generated events every month while ensuring end-to-end latencies of under a minute and queries typically scan more than 1 Quadrillion events over the...


  • San Francisco, United States Mixpanel Full time

    We are actively recruiting for multiple Software Engineers across different levels for our org!About the RoleMixpanel is powered by a custom distributed database. This system ingests more than 1 Trillion user-generated events every month while ensuring end-to-end latencies of under a minute and queries typically scan more than 1 Quadrillion events over the...


  • San Francisco, United States Amplitude Full time

    About The Role & TeamWe're looking for a Staff Software Engineer to help build our query engine and tackle big challenges in a fast-growing data company. Our engineers are leading the efforts to drive our large-scale distributed systems to the 10x level while making innovations to our industry-leading analytics capabilities. As a Staff engineer of the Query...


  • San Francisco, California, United States Databricks Full time

    Role OverviewWe are seeking a highly skilled Software Engineer to join our Runtime team at Databricks. This role involves building the next generation distributed data storage and processing systems that can outperform specialized SQL query engines in relational query performance, yet provide the expressiveness and programming abstractions to support diverse...


  • San Francisco, California, United States Amplitude Full time

    Amplitude is a leading digital analytics platform that empowers businesses to unlock the full potential of their products. With a portfolio of over 3,200 customers, including household names like Atlassian and Under Armour, our solutions provide unparalleled visibility into customer behavior and enable data-driven decision making.We're passionate about...


  • San Francisco, United States OpenAI Full time

    About the Team The Platform Runtime team builds the low level framework components to power our ML training systems. We work on building robust, scalable, high performance components to support our distributed training workloads. Our priorities are to maximize the productivity of our researchers and our hardware, with the goal of accelerating progress...


  • San Francisco, United States OpenAI Full time

    About the Team The Platform Runtime team builds the low level framework components to power our ML training systems. We work on building robust, scalable, high performance components to support our distributed training workloads. Our priorities are to maximize the productivity of our researchers and our hardware, with the goal of accelerating progress...


  • San Francisco, California, United States Intelliswift Software Full time

    We are looking for a talented Distributed System Architect to design and implement our Kafka infrastructure at Intelliswift Software. The ideal candidate will have extensive experience with Confluent Kafka and be able to architect and implement scalable, high-performance distributed systems.Responsibilities include designing and implementing scalable Kafka...


  • San Francisco, California, United States Ripple Full time

    Company OverviewRipple is a pioneering company that is changing the way value moves around the world. Our goal is to build a world where value can move like information does today, making it faster, cheaper, and more efficient. We are committed to innovation, collaboration, and customer satisfaction, and we strive to create a workplace culture that is...


  • San Francisco, California, United States Mixpanel Full time

    About MixpanelWe are a leading product analytics software company, helping businesses answer critical questions about their products.Our event-based tracking solution enables teams to gain insights into user behavior across web and mobile platforms.We serve nearly 7,000 customers worldwide through seven offices globally.The RoleWe are seeking an experienced...