Software Engineer, Distributed Systems

16 hours ago


San Francisco CA United States OpenAI Full time

About the Team

The Platform Runtime team builds the low-level framework components to power our ML training systems. We work on building robust, scalable, high-performance components to support our distributed training workloads. Our priorities are to maximize the productivity of our researchers and our hardware, with the goal of accelerating progress towards AGI.

About the Role

As a Distributed Systems engineer, you will work to deliver powerful APIs orchestrating thousands of computers moving and persisting vast amounts of data. This requires both providing easy-to-use, introspectable systems that can promote a fast debugging and development cycle, while also enabling that experience to scale to our newest supercomputers maintaining stability and performance throughout.

We’re looking for people who love optimizing an end-to-end system, understanding high-performance I/O to maximize local performance and distributed across our supercomputers. We want someone excited by the rapid pace of responding to the dynamic and evolving needs of our training systems architectures.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

In this role, you will:

  1. Work across our Python and Rust stack
  2. Profile, optimize, and help design for scale our compute and data capabilities
  3. Work on deploying our training framework to our latest supercomputers, rapidly responding to the changing shapes and needs of the ML systems.

You might thrive in this role if you:

  1. Have worked on large distributed systems
  2. Love figuring out how systems work and continuously come up with ideas for how to make them faster while minimizing complexity and maintenance burden
  3. Have strong software engineering skills and are proficient in Python and Rust or equivalent.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability, or any other legally protected status.

#J-18808-Ljbffr

  • San Francisco, California, United States USM Business Systems Full time

    Job Title: Senior Software Engineer - Distributed SystemsWe are seeking a highly skilled Senior Software Engineer to join our team in San Francisco, CA. As a key member of our development team, you will be responsible for designing and implementing scalable distributed systems using Java, Kafka, Cassandra, and Spring.About the Role:Develop high-performance,...

  • Software Engineer

    2 days ago


    San Francisco, United States High-Tech Professionals Full time

    Software Engineer - Distributed Systems Job ID: 1782 Location: San Francisco Bay Area Type: Permanent Status: Closed Key Skills: Distributed, parallel system software, C, C++, UNIX, storage architecture, cluster, database, storage IO data, full stack engineering, system development. Description: Seeking Software Engineer to design and build distributed...

  • Software Engineer

    6 days ago


    San Francisco, California, United States MongoDB Full time

    About MongoDBMongoDB empowers innovators to build a better world by unleashing the power of software and data. Our industry-leading developer data platform, MongoDB Atlas, is the only globally distributed, multi-cloud database available in over 115 regions across major cloud providers.Our team is building cloud-based distributed systems software responsible...


  • San Francisco, United States OpenAI Full time

    About the TeamThe Platform Runtime team builds the low-level framework components to power our ML training systems. We work on building robust, scalable, high-performance components to support our distributed training workloads. Our priorities are to maximize the productivity of our researchers and our hardware, with the goal of accelerating progress towards...


  • San Francisco, United States Mixpanel Full time

    We are actively recruiting for multiple Software Engineers across different levels for our org! About the Role Mixpanel is powered by a custom distributed database. This system ingests more than 1 Trillion user-generated events every month while ensuring end-to-end latencies of under a minute and queries typically scan more than 1 Quadrillion events over the...


  • San Francisco, United States Mixpanel Full time

    We are actively recruiting for multiple Software Engineers across different levels for our org!About the RoleMixpanel is powered by a custom distributed database. This system ingests more than 1 Trillion user-generated events every month while ensuring end-to-end latencies of under a minute and queries typically scan more than 1 Quadrillion events over the...


  • San Francisco, California, United States MongoDB Full time

    Company OverviewMongoDB empowers innovators to create, transform, and disrupt industries by unleashing the power of software and data. Our mission is to enable organizations of all sizes to easily build, scale, and run modern applications.Estimated Salary: $150,000 - $200,000 per yearJob DescriptionWe are seeking a highly skilled Senior Software Engineer to...


  • San Francisco, California, United States Databricks Full time

    Role OverviewWe are seeking a highly skilled Software Engineer to join our Runtime team at Databricks. This role involves building the next generation distributed data storage and processing systems that can outperform specialized SQL query engines in relational query performance, yet provide the expressiveness and programming abstractions to support diverse...


  • San Francisco, California, United States Amplitude Full time

    Amplitude is a leading digital analytics platform that empowers businesses to unlock the full potential of their products. With a portfolio of over 3,200 customers, including household names like Atlassian and Under Armour, our solutions provide unparalleled visibility into customer behavior and enable data-driven decision making.We're passionate about...


  • San Francisco, United States OpenAI Full time

    About the Team The Platform Runtime team builds the low level framework components to power our ML training systems. We work on building robust, scalable, high performance components to support our distributed training workloads. Our priorities are to maximize the productivity of our researchers and our hardware, with the goal of accelerating progress...


  • San Francisco, United States OpenAI Full time

    About the Team The Platform Runtime team builds the low level framework components to power our ML training systems. We work on building robust, scalable, high performance components to support our distributed training workloads. Our priorities are to maximize the productivity of our researchers and our hardware, with the goal of accelerating progress...


  • San Diego, CA, United States ZipRecruiter Full time

    Job Description We are seeking a software engineer with a passion for building and validating resilient distributed systems. At Canonical, you can build a career and drive the success of those leveraging Canonical's Ubuntu and Juju to build multi-cloud deployable cloud applications. We see quality engineering as a first-class engineering practice and are...


  • San Francisco, United States ZipRecruiter Full time

    Job DescriptionPosition: Senior Distributed Systems EngineerWe are looking for a senior distributed systems engineer to join the Core Team (aka our Distributed Systems Team). Our Core Team handles the scheduling, planning, and execution of data syncing. They work on the systems that power our core syncing engine that other engineering teams, as well as...


  • San Francisco, United States San Francisco Compute Co. Full time

    About We’re the San Francisco Compute Company. We’re building the first real-time compute trading platform. We think that over the next decade, thousands of startups and labs are going to be training and serving large models. They need compute to do this, and we’re building a platform on which that compute can be traded. If we’re successful, it will...


  • San Francisco, United States salesforce Full time

    To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.Job Category: Software EngineeringJob Details:About Salesforce: We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across...

  • Software Engineer

    2 weeks ago


    San Francisco, United States Wayfinder Full time

    As a Distributed Systems Engineer at Browserbase, you’ll be directly responsible for developing our core web automation platform. You’ll ensure it is high performance, scalable, constantly evolving and growing, and that our customers know they can count on it.As a Distributed Systems Engineer at Browserbase, you will:Build, operate, and grow the...


  • San Francisco, United States San Francisco Compute Co. Full time

    AboutWe’re the San Francisco Compute Company. We’re building the first real-time compute trading platform. We think that over the next decade, thousands of startups and labs are going to be training and serving large models. They need compute to do this, and we’re building a platform on which that compute can be traded. If we’re successful, it will...


  • San Francisco, California, United States GEICO Full time

    Position OverviewWe are seeking an experienced Software Systems Engineer to join our team at GEICO. As a key member of our engineering organization, you will be responsible for designing, building, and maintaining scalable, resilient distributed systems that meet the needs of our customers.Key ResponsibilitiesDesign and implement distributed systems that...


  • San Francisco, California, United States Cloudflare, Inc. Full time

    About UsAt Cloudflare, we're dedicated to building a better Internet. Our mission is to create a fast, secure and reliable network that powers millions of websites and applications worldwide.We're looking for talented individuals who share our vision and are passionate about developing high-performance distributed systems. As a Distributed Systems Engineer...


  • San Francisco, CA, United States Amazon Full time

    Software Engineer- AI/ML, AWS Neuron Distributed Training Do you love decomposing problems to develop products that impact millions of people around the world? Would you enjoy identifying, defining, and building software solutions that revolutionize how businesses operate?The Annapurna Labs team at Amazon Web Services (AWS) is looking for a Software...