Distributed Training Systems Architect

3 days ago


San Francisco, California, United States OpenAI Full time
About OpenAI

OpenAI is a leading AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.

AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

Job DescriptionAbout the Role

We are seeking an experienced Distributed Training Systems Architect to join our team in San Francisco, CA. The role will focus on improving the training throughput for our internal training framework and enable researchers to experiment with new ideas.

This requires good engineering skills, including designing, implementing, and optimizing state-of-the-art AI models, writing bug-free machine learning code, and acquiring deep knowledge of the performance of supercomputers.

Key Responsibilities:
  • Collaborate with researchers to enable them to develop systems-efficient video models and architectures
  • Apply the latest techniques to our internal training framework to achieve impressive hardware efficiency for our training runs
  • Profile and optimize our training framework
You Might Thrive in This Role If You:
  • Have experience working with multi-modal ML pipelines
  • Love diving deep into systems implementations and understanding their fundamentals in order to improve their performance and maintainability
  • Have strong software engineering skills and are proficient in Python
  • Have experience understanding and optimizing training kernels
  • Are passionate about understanding stable training dynamics


  • San Francisco, California, United States Anyscale Full time

    We're seeking a skilled Distributed Systems Architect to join our team at Anyscale. Our mission is to democratize distributed computing and make it accessible to software developers of all skill levels.As a Distributed Systems Architect, you'll play a key role in building the best place to run Ray, a popular open-source project that's creating an ecosystem...


  • San Francisco, California, United States Intelliswift Software Full time

    We are looking for a talented Distributed System Architect to design and implement our Kafka infrastructure at Intelliswift Software. The ideal candidate will have extensive experience with Confluent Kafka and be able to architect and implement scalable, high-performance distributed systems.Responsibilities include designing and implementing scalable Kafka...


  • San Francisco, California, United States MongoDB Full time

    MongoDB's Atlas Search Query team is looking for a highly skilled Distributed System Architect to join our ranks. As a key member of our team, you'll design and implement a cloud-based search service that allows users to execute complex search queries using the MongoDB Query Language.You'll work closely with our team to develop new features, optimize...


  • San Francisco, California, United States Nextdoor Full time

    Job DescriptionWelcome to the Core Services team at Nextdoor, where we operate critical high throughput services that power communities worldwide. As a Distributed Systems Architect, you will work in a large-scale distributed system environment, identifying high leverage opportunities to increase performance, scalability, and resilience.The primary focus of...


  • San Jose, California, United States Mindlance Full time

    Distributed Systems Architect PositionWe are seeking an experienced Distributed Systems Architect to join our team. As a Distributed Systems Architect, you will be responsible for designing and building large-scale distributed systems. You will work closely with our engineering team to ensure that our systems are secure, scalable, and highly available.Key...


  • San Francisco, California, United States Ripple Full time

    About the RoleRipple is seeking an experienced Senior Distributed Systems Architect to join our team. As a key member of our architecture team, you will be responsible for designing and developing complex distributed systems, leveraging your expertise in C++ and blockchain technology.Key Responsibilities• Lead the development of innovative architectural...


  • San Francisco, California, United States Cloudflare Inc Full time

    Company OverviewCloudflare Inc is a pioneering technology company that aims to build a better Internet. With a massive network powering millions of websites and Internet properties, we provide protection and acceleration for any Internet application without requiring hardware, software installation, or code modifications.As a highly ambitious company with a...


  • San Francisco, California, United States Succinct Full time

    Innovate with Succinct, a pioneer in blockchain scaling, interoperability, and privacy solutions. As a Senior Software Engineer, you'll play a crucial role in developing our distributed proving cluster for SP1 and prover network in our San Francisco office.About the RoleThis position requires expertise in architecting and maintaining a highly available...


  • San Francisco, California, United States Akka Full time

    Overview:Akka is a leader in the software industry, specializing in distributed systems and real-time streaming solutions. We are seeking a talented Field CTO to join our team and play a pivotal role in our sales strategy.Job Description:The Field CTO will be responsible for collaborating with clients to identify technical needs and provide tailored...


  • San Francisco, California, United States OpenAI Full time

    **About OpenAI**We are a pioneering AI research and deployment company dedicated to making artificial intelligence a force for good. Our mission is to ensure that general-purpose AI benefits all of humanity.As a Distributed Systems/ML engineer, you will play a key role in improving the training throughput for our internal framework and enabling researchers...


  • San Francisco, California, United States Arbitrum Full time

    Senior Distributed Systems EngineerWe're looking for a seasoned professional to design and develop scalable, reliable, and high-performance distributed systems.This is an ideal opportunity for an engineer who is passionate about tackling complex problems in blockchain scalability, or looking to break into the field of blockchain engineering. If you're...


  • San Francisco, California, United States Akka Full time

    Job OverviewThe Field CTO role at Akka is a pivotal position, responsible for driving customer engagement and product adoption through a deep understanding of our innovative software offerings.About UsAkka is a leader in the software industry, specializing in distributed systems and real-time streaming solutions, with a focus on Akka, Java, Kalix, and...


  • San Diego, California, United States Apple Full time

    Job Description:Distributed Systems Architect - Cloud EngineeringDesign novel distributed architectures to accelerate software build, test, and deployment.Combine problem domain expertise with established techniques to achieve high performance, reliability, and long-term maintainability.Analyze requirements, existing solutions, and systems to inform...

  • Network Architect

    1 week ago


    San Francisco, California, United States Broadcom Corporation Full time

    About the Role:We are seeking a Network Architect - Distributed Systems professional to join our team at Broadcom Corporation. As a key member of our VCF Division, you will have the opportunity to work on bleeding-edge network virtualization technologies, network overlays, and layer-2 switching. The VCF networking management stack is responsible for...


  • San Francisco, California, United States Nextdoor Full time

    Your Future Role:We are seeking a Distributed Datastore Architect to join our Core Services team. As a key member of this team, you will be responsible for designing and implementing a distributed, globally replicated graph store with high availability tiered caching. This datastore will provide predictable performance and enable product engineers to move...


  • San Francisco, California, United States Tbwa ChiatDay Inc Full time

    Alchemy, a pioneer in web3 development, seeks an experienced Backend Architect to join its team of experts. Our mission is to empower builders with the tools necessary to create exceptional on-chain products, leveraging our complete developer platform that offers powerful APIs, SDKs, and tools.With infrastructure powering 70% of top web3 teams, 90%+ of web2...


  • San Francisco, California, United States MongoDB Full time

    MongoDB is looking for an experienced Senior Software Engineer to lead the development of our Atlas Search Query team.As a key leader, you will set project-level strategy, architect features, and lead projects to successful execution.With experience in developing stateful distributed systems and designing high-volume query engines, you will be responsible...


  • San Francisco, California, United States OpenAI Full time

    About the RoleWe are seeking a highly skilled Software Engineer to join our Data Acquisition team at OpenAI. The ideal candidate will have 4+ years of industry experience in software development, with a strong background in large stateful distributed systems and data processing.The successful candidate will have expertise in Kubernetes,...

  • Cloud Architect

    4 days ago


    San Jose, California, United States Tik Tok Full time

    Job OverviewWe are seeking a highly skilled Cloud Architect - Distributed Systems to join our team at TikTok. As a key member of our infrastructure team, you will be responsible for designing and implementing scalable microservices architectures for our complex online systems.


  • San Francisco, California, United States Gridware Full time

    Company Overview:Gridware is a pioneering company in the field of grid management, dedicated to enhancing and protecting the electrical grid. Our team engineers advanced sensing systems to analyze both electrical and mechanical behavior of grid assets, identifying faults and enabling preemptive mitigation.We are headquartered in the Bay Area, California, and...