Distributed Training Infrastructure Engineer

7 days ago


Mountain View, California, United States Waymo Full time
About Waymo

Waymo is an innovative autonomous driving technology company with a mission to provide the most trusted driver. Our team has been focused on building the Waymo Driver, the world's most experienced driver, to improve access to mobility and save thousands of lives lost to traffic crashes.

The Waymo Driver powers our fully autonomous ride-hailing service, Waymo One, and can be applied to various vehicle platforms and product use cases. With over one million rider-only trips and tens of millions of miles driven autonomously on public roads, we are making significant progress in this field.

Distributed Training Infrastructure Engineer Role

We are seeking a Distributed Training Infrastructure Engineer to join our hybrid team. As a key member of our Machine Learning Infrastructure team, you will work closely with Research and Production teams to develop models in Perception and Planning that are core to our autonomous driving software.

Your primary responsibility will be to develop the infrastructure components necessary for distributed training, including job scheduling, resource management, data distribution, and model synchronization. You will also implement automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructure to improve operations and reliability.

In this role, you will:

  • Design and implement scalable distributed training systems
  • Develop tools and libraries to enhance TensorFlow and JAX
  • Optimize performance and efficiency of distributed training pipelines
  • Collaborate with cross-functional teams to integrate distributed training infrastructure
Requirements

To be successful in this role, you will need:

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience
  • Experience with distributed systems principles and building distributed systems for production environments
  • Solid Python or C++ skills
  • Prior experience with Machine Learning frameworks (e.g., TensorFlow, PyTorch) and distributed training algorithms
  • Ability to debug complex distributed systems issues
About the Role

This full-time position offers a competitive salary range of $158,000-$200,000 USD, depending on experience and location. Additionally, you will have opportunities to participate in Waymo's discretionary annual bonus program, equity incentive plan, and generous Company benefits program.

Our team is committed to creating a supportive and inclusive environment where everyone feels valued and empowered to contribute their best work. We strive to foster a culture that promotes diversity, equity, and inclusion, and we are committed to making Waymo a great place to work for all employees.



  • Mountain View, California, United States Waymo Full time

    Company OverviewWaymo is a pioneering autonomous driving technology company dedicated to creating the world's most trusted driver. With its roots in the Google Self-Driving Car Project, Waymo has been working tirelessly since 2009 to build the Waymo Driver, an AI system designed to improve access to mobility while saving countless lives lost to traffic...


  • Mountain View, California, United States Waymo Full time

    About the CompanyWaymo is a leader in autonomous driving technology, working to improve access to mobility while saving thousands of lives. Since 2009, we've focused on building the Waymo Driver—the world's most experienced driver—using cutting-edge artificial intelligence and machine learning algorithms.Job SummaryWe're seeking an experienced...


  • Mountain View, California, United States Waymo Full time

    About WaymoWaymo is a pioneering autonomous driving technology company dedicated to revolutionizing the way people move. With a mission to be the most trusted driver, we have been at the forefront of this industry since its inception as the Google Self-Driving Car Project in 2009.Our journey has been marked by significant milestones, including providing over...


  • Mountain View, California, United States Nuro Full time

    About NuroNuro exists to better everyday life through robotics. Founded in 2016, we have developed autonomous driving (AD) technology and commercialized AD applications. Our world-class autonomous driving system combines AD hardware with our generalized AI-first self-driving software. Our system is built to learn and improve through data and is one of the...


  • Mountain View, California, United States ThisWay Full time

    Job OverviewThisWay is seeking a highly skilled Reliability Engineer - Infrastructure Specialist in Mountain View, CA. The role involves supporting the design, build, and maintenance of multi-cloud platforms, including cloud-hosted and serverless services.Key Responsibilities:Lead technical initiatives to enhance the reliability of global infrastructure...


  • Mountain View, California, United States Applied Intuition Full time

    About the RoleWe are seeking a Senior Engineer to own and maintain our High-Definition (HD) maps infrastructure. Our product suite utilizes HD maps to solve customer needs in various applications, including localized information calculation, global map querying, and data visualization. This role offers an opportunity to shape the future of our maps by...


  • Mountain View, California, United States Tik Tok Full time

    About Us">TikTok is a leading short-form mobile video platform with a mission to inspire creativity and bring joy.">">Job Overview">We are seeking a highly skilled Distributed Systems Engineer to join our Data Platform team.">">Key Responsibilities">">Design, build, and maintain large-scale distributed systems to support our core products and...


  • Mountain View, California, United States NewsBreak Full time

    About NewsBreakNewsBreak is revolutionizing the way users interact with local news and their communities by bridging local users, content creators, and businesses.We foster safer, more vibrant, and authentically connected lives through robust collaborations with thousands of local publishers and businesses across the nation.Our MissionWe are redefining the...


  • Mountain View, California, United States LinkedIn Full time

    Role OverviewAt LinkedIn, we're building a platform that helps professionals achieve more in their careers. As a Systems & Infrastructure Engineering Intern, you will play a key role in building and supporting large-scale systems, and utilizing distributed systems and algorithms to help scale LinkedIn's infrastructure to handle massive growth in membership,...


  • Mountain View, California, United States Waymo Full time

    About the RoleJob SummaryWaymo is an autonomous driving technology company with a mission to be the most trusted driver. As a member of the Machine Learning Infrastructure team, you will work closely with Research and Production teams to develop models in Perception and Planning that are core to our autonomous driving software.About the TeamThe team focuses...


  • Mountain View, California, United States Nuro Full time

    About UsNuro is a robotics company that aims to improve daily life through autonomous driving technology. Our team has spent years developing a world-class autonomous driving system, the Nuro DriverTM, which combines AD hardware with our AI-first self-driving software.Job SummaryWe are looking for a Senior Software Engineer, ML Data Infrastructure to join...


  • Mountain View, California, United States LinkedIn Full time

    Company Overview">LINkedIn is the world's largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. We're also committed to providing transformational...


  • Mountain View, California, United States Joyent Full time

    About the RoleWe are seeking a talented Cloud Infrastructure Developer to join our dynamic DX (Developer Experience) Team at Joyent. As a key member of this team, you will play a crucial part in designing and developing the next generation Kubernetes cloud web application for our cloud-based platform.Key Responsibilities:System Design and Architecture:...


  • Mountain View, California, United States Hireio, Inc. Full time

    About the RoleHireio, Inc. is looking for a seasoned Tech Lead Manager to spearhead our developer infrastructure engineering team behind mobile applications serving billions of users worldwide.We're seeking an expert in building client-side mobile infrastructure with a strong track record in leading software engineering teams to improve app-wide performance...


  • Mountain View, California, United States ID Full time

    ID.me is a pioneering enterprise software company that simplifies identity verification and sharing online.We empower individuals to control their data through a secure, portable login, eliminating the need for multiple passwords across websites.Our digital identity network boasts 117 million registered members, used by 14 federal agencies, 30 states, and...


  • Mountain View, California, United States Nuro Full time

    About NuroNuro is a robotics company that aims to improve everyday life through innovative technology. Founded in 2016, the company has spent years developing autonomous driving solutions.With over $2 billion in funding from top investors, Nuro has partnered with leading brands to revolutionize the way goods are transported.About the RoleNuro is seeking a...


  • Mountain View, California, United States DeepMind Full time

    We're seeking an experienced Infrastructure Engineer to lead and drive technology projects in our workplace. As an Enterprise Engineering Manager, you will be responsible for the planning, maintenance, and security of a diverse range of systems, including Quantum Stornext storage arrays, hypervisors, Active Directory Servers, Linux, Windows, and Mac...


  • Mountain View, California, United States iSoftTek Solutions Inc Full time

    Job Title: Senior Cloud Engineer - Kubernetes ExpertLocation: Mountain View, CAJob Type: Full-time W2 positionDuration: Long-term engagementWe are seeking an experienced Senior Cloud Engineer to join our team as a Kubernetes expert. The ideal candidate will have at least 7+ years of experience working with GCP and a strong background in Linux, shell...


  • Mountain View, California, United States Hireio, Inc. Full time

    About the RoleWe are seeking a seasoned Tech Lead Manager to spearhead our Developer Infrastructure team. This individual will be responsible for leading a team of software engineers to develop and maintain large-scale services, frameworks, tools, and systems.The ideal candidate will have a strong technical background, with experience in mobile...


  • Mountain View, California, United States Databricks Inc. Full time

    Databricks is seeking a seasoned Cloud Infrastructure Lead to shape the future of Databricks' infrastructure through data science. This position will tackle some of the most complex challenges related to capacity planning, performance optimization, reliability engineering, infrastructure efficiency, and customer experience.This individual will lead a team of...