Distributed Training Systems Engineer

4 days ago


Mountain View, California, United States Waymo Full time

Overview: At Waymo, we're working towards a future where everyone can get where they need to go without needing a car. We're looking for a skilled Machine Learning Engineer, Training to help us achieve this goal.

Key Responsibilities: In this hybrid role, you will report to the Technical Lead Manager of Machine Learning Training. Your primary responsibilities will include developing infrastructure components necessary for distributed training, implementing automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructure, monitoring system health, and identifying performance bottlenecks and optimization opportunities.

Requirements: To succeed in this role, you'll need a Bachelor's degree in Computer Science, Engineering, or a related field, or 2+ years equivalent experience. You should have experience with distributed systems principles, solid Python or C++ skills, and prior experience with Machine Learning frameworks (e.g., TensorFlow, PyTorch) and distributed training algorithms.

What We Offer: As a Waymo employee, you'll enjoy a salary range of $158,000 - $200,000 USD annually, based on experience and qualifications. You'll also be eligible to participate in our discretionary annual bonus program, equity incentive plan, and generous Company benefits program, subject to eligibility requirements.



  • Mountain View, California, United States Waymo Full time

    About the CompanyWaymo is a leader in autonomous driving technology, working to improve access to mobility while saving thousands of lives. Since 2009, we've focused on building the Waymo Driver—the world's most experienced driver—using cutting-edge artificial intelligence and machine learning algorithms.Job SummaryWe're seeking an experienced...


  • Mountain View, California, United States Waymo Full time

    About WaymoWaymo is an innovative autonomous driving technology company with a mission to provide the most trusted driver. Our team has been focused on building the Waymo Driver, the world's most experienced driver, to improve access to mobility and save thousands of lives lost to traffic crashes.The Waymo Driver powers our fully autonomous ride-hailing...


  • Mountain View, California, United States Waymo Full time

    Job DescriptionThis Hybrid role reports to our TLM of Machine Learning Training and involves:Developing the infrastructure components necessary for distributed trainingImplementing automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructureMonitoring system health and performing routine maintenance tasks...


  • Mountain View, California, United States Intrinsic Full time

    Job Summary: We're seeking an exceptional Distributed Systems Engineer to join our team. As a key contributor, you will play a critical role in designing and implementing a distributed cloud and on-premises system that enables users worldwide to develop and deploy automation solutions. Your expertise in distributed systems, cloud computing, and robotics will...


  • Mountain View, California, United States Waymo Full time

    About the JobThe Waymo ML Infrastructure team is seeking an experienced Senior Machine Learning Engineer, Training to work on developing infrastructure components for distributed training and implementing automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructure.This Hybrid role requires:Developing the...


  • Mountain View, California, United States Waymo Full time

    Job SummaryWaymo is looking for a skilled Senior Machine Learning Engineer, Training to join our Hybrid team. In this role, you will develop the infrastructure components necessary for distributed training, implement automation solutions, and monitor system health. If you have experience building distributed systems and working with Machine Learning...


  • Mountain View, California, United States Waymo Full time

    Company OverviewWaymo is a pioneering autonomous driving technology company dedicated to creating the world's most trusted driver. With its roots in the Google Self-Driving Car Project, Waymo has been working tirelessly since 2009 to build the Waymo Driver, an AI system designed to improve access to mobility while saving countless lives lost to traffic...


  • Mountain View, California, United States Waymo Full time

    Taking Autonomous Driving to the Next LevelAt Waymo, we're pushing the boundaries of what's possible with autonomous driving technology. As a Senior Distributed Systems Developer, you'll have the chance to work on high-impact projects that drive innovation and growth.About the Position:Design and develop scalable distributed training infrastructure...


  • Mountain View, California, United States Databricks Full time

    Journey to Data InnovationJoin Databricks as a Staff Software Engineer and embark on a journey to revolutionize data processing and analysis. Our mission is to simplify the entire data lifecycle from ingestion to ETL, BI, and ML/AI with a unified platform, leveraging the power of Lakehouse architecture.You'll work on building next-generation systems for...


  • Mountain View, California, United States Waymo Full time

    Job DescriptionWaymo is an autonomous driving technology company with the mission to become the most trusted driver. We are seeking a skilled Machine Learning Distributed Systems Developer to join our Hybrid team.In this role, you will report to our TLM of Machine Learning Training and work closely with Research and Production teams to develop models in...


  • Mountain View, California, United States LinkedIn Full time

    Job DescriptionWe're looking for a seasoned Distributed Systems Architect to join our world-class software engineering team at LinkedIn. As a critical member of our infrastructure team, you'll play a pivotal role in shaping the next-generation infrastructure and platforms that power our platform. With a focus on building scalable, secure, and reliable...


  • Mountain View, California, United States Tik Tok Full time

    About Us">TikTok is a leading short-form mobile video platform with a mission to inspire creativity and bring joy.">">Job Overview">We are seeking a highly skilled Distributed Systems Engineer to join our Data Platform team.">">Key Responsibilities">">Design, build, and maintain large-scale distributed systems to support our core products and...


  • Mountain View, California, United States Google Inc. Full time

    About the RoleAs a Senior Software Engineer at Google Inc., you will be part of a team responsible for designing, developing, and maintaining large-scale distributed systems. Your primary focus will be on ensuring the reliability, uptime, and scalability of our services.Responsibilities:Lead a team of engineers in the development of software solutions to...


  • Mountain View, California, United States Nuro Full time

    **Overview**">Nuro is a cutting-edge robotics company that's changing the game with its autonomous driving technology. As a leader in the industry, we're always pushing the boundaries of innovation. Our team is passionate about developing cutting-edge solutions that make a real difference in people's lives.We're currently looking for a skilled Machine...


  • Mountain View, California, United States Elastic Full time

    About the RoleWe are seeking a Distributed System Architect to join our team. As a member of this team, you will be responsible for designing, building, and maintaining software supporting our cloud offerings and on-prem services, and participating in coding, technical design, crafting solutions, debugging complicated failure scenarios, prioritizing...


  • Mountain View, California, United States Qualified Technical Services Full time

    Job Summary:">We are seeking a highly skilled Software Systems Engineer to join our team at NASA Ames Research Center in Mountain View, CA. This role involves developing the core infrastructure for autonomous coordination between spacecraft and ground applications.">About the Project:">The Distributed Spacecraft Autonomy project is centered out of NASA Ames...


  • Mountain View, California, United States Databricks Full time

    This role requires an experienced software engineer who can develop and maintain complex software systems. The ideal candidate will have a strong background in computer science, experience with distributed systems, and a passion for innovation. In addition to a competitive salary, Databricks offers a comprehensive benefits package, including health...


  • Mountain View, California, United States Waymo Full time

    Waymo is a pioneering autonomous driving technology company with a mission to revolutionize mobility while prioritizing safety.We develop the infrastructure components necessary for distributed training, including job scheduling, resource management, data distribution, and model synchronization. Implement automation solutions for provisioning, deployment,...


  • Mountain View, California, United States Microsoft Corporation Full time

    Required Skills and Qualifications:Bachelor's Degree in Computer Science or related technical discipline and 6+ years of technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python.3+ years of experience in people management, demonstrating the ability to lead and influence across teams.Experience in developing...


  • Mountain View, California, United States Waymo Full time

    Job DescriptionWe are looking for an exceptional Backend Software Engineer to design and develop platform and infrastructure that supports various ride-hailing businesses. As a key member of our TaaS Infrastructure team, you will provide APIs for both our first-party service and partner services, optimize the marketplace to balance supply and utilization of...