Distributed Training Specialist

6 days ago


Mountain View, California, United States Waymo Full time
About the Job

The Waymo ML Infrastructure team is seeking an experienced Senior Machine Learning Engineer, Training to work on developing infrastructure components for distributed training and implementing automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructure.

This Hybrid role requires:

  • Developing the necessary infrastructure components for distributed training
  • Implementing automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructure
  • Monitoring system health, diagnosing, and performing routine maintenance tasks to ensure the reliability of the distributed training infrastructure
  • Identifying performance bottlenecks and optimization opportunities
  • Improving the developer experience and performance of our scalable ML framework

Your Skills and Experience

  • Bachelor's degree in Computer Science, Engineering, or related field, or 4+ years equivalent experience
  • Experience building distributed systems for production environments
  • Solid Python or C++ skills
  • Prior experience with Machine Learning frameworks (e.g., TensorFlow, PyTorch) and distributed training algorithms

Compensation and Benefits

  • Competitive salary range: $192,000 - $243,000 USD
  • Discretionary annual bonus program
  • Equity incentive plan
  • Generous Company benefits program


  • Mountain View, California, United States Waymo Full time

    Company OverviewWaymo is a pioneering autonomous driving technology company dedicated to creating the world's most trusted driver. With its roots in the Google Self-Driving Car Project, Waymo has been working tirelessly since 2009 to build the Waymo Driver, an AI system designed to improve access to mobility while saving countless lives lost to traffic...


  • Mountain View, California, United States Waymo Full time

    Job DescriptionThis Hybrid role reports to our TLM of Machine Learning Training and involves:Developing the infrastructure components necessary for distributed trainingImplementing automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructureMonitoring system health and performing routine maintenance tasks...


  • Mountain View, California, United States Waymo Full time

    Job SummaryWaymo is looking for a skilled Senior Machine Learning Engineer, Training to join our Hybrid team. In this role, you will develop the infrastructure components necessary for distributed training, implement automation solutions, and monitor system health. If you have experience building distributed systems and working with Machine Learning...


  • Mountain View, California, United States Waymo Full time

    About WaymoWaymo is an innovative autonomous driving technology company with a mission to provide the most trusted driver. Our team has been focused on building the Waymo Driver, the world's most experienced driver, to improve access to mobility and save thousands of lives lost to traffic crashes.The Waymo Driver powers our fully autonomous ride-hailing...


  • Mountain View, California, United States Waymo Full time

    About the CompanyWaymo is a leader in autonomous driving technology, working to improve access to mobility while saving thousands of lives. Since 2009, we've focused on building the Waymo Driver—the world's most experienced driver—using cutting-edge artificial intelligence and machine learning algorithms.Job SummaryWe're seeking an experienced...


  • Mountain View, California, United States Waymo Full time

    Overview: At Waymo, we're working towards a future where everyone can get where they need to go without needing a car. We're looking for a skilled Machine Learning Engineer, Training to help us achieve this goal.Key Responsibilities: In this hybrid role, you will report to the Technical Lead Manager of Machine Learning Training. Your primary responsibilities...


  • Mountain View, California, United States Nuro Full time

    **Overview**">Nuro is a cutting-edge robotics company that's changing the game with its autonomous driving technology. As a leader in the industry, we're always pushing the boundaries of innovation. Our team is passionate about developing cutting-edge solutions that make a real difference in people's lives.We're currently looking for a skilled Machine...


  • Mountain View, California, United States Databricks Full time

    About the JobWe're looking for a talented Distributed Systems Optimization Specialist to join our team at Databricks. In this role, you'll be responsible for optimizing the performance of our data and AI platform, ensuring it meets the needs of our customers.The Impact You'll HaveIdentify performance limitations of our entire stack based on telemetry,...

  • Training Specialist

    22 hours ago


    Mountain View, California, United States RODGERS CONSULTING SERVICE INC Full time

    Rodgers Consulting Services is a state-licensed organization offering support to individuals with developmental disabilities, helping them achieve independence and excel in life. We are seeking a Training Specialist to join our team.Job Summary:To provide independent living skills training for adult consumers with developmental disabilities.Implement...


  • Mountain View, California, United States Waymo Full time

    Taking Autonomous Driving to the Next LevelAt Waymo, we're pushing the boundaries of what's possible with autonomous driving technology. As a Senior Distributed Systems Developer, you'll have the chance to work on high-impact projects that drive innovation and growth.About the Position:Design and develop scalable distributed training infrastructure...


  • Mountain View, California, United States Waymo Full time

    Job DescriptionWaymo is an autonomous driving technology company with the mission to become the most trusted driver. We are seeking a skilled Machine Learning Distributed Systems Developer to join our Hybrid team.In this role, you will report to our TLM of Machine Learning Training and work closely with Research and Production teams to develop models in...


  • Mountain View, California, United States Qualified Technical Services Full time

    Job Summary:">We are seeking a highly skilled Software Systems Engineer to join our team at NASA Ames Research Center in Mountain View, CA. This role involves developing the core infrastructure for autonomous coordination between spacecraft and ground applications.">About the Project:">The Distributed Spacecraft Autonomy project is centered out of NASA Ames...


  • Mountain View, California, United States Tik Tok Full time

    Explore an exciting opportunity at TikTok, a leading platform for short-form mobile video. As a USDS Training and Development Specialist, you will play a critical role in creating and delivering training programs that enhance the skills of content moderators worldwide.About the RoleWe are seeking a seasoned Training and Development Specialist to join our...


  • Mountain View, California, United States Alameda Health System Full time

    Job SummaryWe are seeking a highly skilled Clinical Educator and Training Specialist to join our team at Alameda Health System. This is an exciting opportunity for a motivated individual to design, implement, and evaluate strategically aligned programs that contribute to the orientation and ongoing competence of staff.The ideal candidate will have a strong...

  • Training Specialist

    13 hours ago


    Mountain View, California, United States RODGERS CONSULTING SERVICE INC Full time

    Job TitleA Service Instructor is required to provide in-home and community-based training for adult consumers with developmental disabilities.About UsRodgers Consulting Services is a state-licensed organization that provides support services to individuals with developmental disabilities, enabling them to live independently and reach their full...


  • Mountain View, California, United States Databricks Full time

    At Databricks, we are dedicated to fostering a diverse and inclusive environment where everyone can thrive. As a Performance Specialist, you will have the opportunity to work on complex projects and collaborate with multiple teams across the company.This role requires a strong background in performance analysis, software development, and distributed systems....


  • Mountain View, California, United States Waymo Full time

    About This OpportunityWe're seeking a talented developer to join our ML Infrastructure team as a Distributed Training Specialist. In this role, you will play a key part in developing the infrastructure components necessary for distributed training, including job scheduling, resource management, data distribution, and model synchronization.Your Key...


  • Mountain View, California, United States Expert Seekers Training Full time

    About Our Company:Expert Seekers Training is a recognized leader in the industry, known for its Top Company Culture and rapid growth. Our employees consistently rate us highly on platforms like Glassdoor and Indeed.Job Description:You will be responsible for collaborating closely with mentors and operating as part of a cohesive team. This includes engaging...


  • Mountain View, California, United States Tik Tok Full time

    Job SummaryThe Machine Learning Infrastructure Specialist will be responsible for designing and implementing the infrastructure for TikTok's machine learning models. This role requires expertise in distributed systems, data engineering, and cloud computing.Key ResponsibilitiesDesign and develop scalable data pipelines for machine learning model training and...


  • Mountain View, California, United States Moveworks Full time

    We are looking for a Cloud System Specialist to join our Core Infrastructure team. As a senior member of this team, you will be responsible for architecting the next generation of the Moveworks AI infrastructure. Your expertise in designing and building scalable, reliable, and resilient foundational services will enable our product to scale seamlessly and...