Distributed Training Specialist
6 days ago
The Waymo ML Infrastructure team is seeking an experienced Senior Machine Learning Engineer, Training to work on developing infrastructure components for distributed training and implementing automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructure.
This Hybrid role requires:
- Developing the necessary infrastructure components for distributed training
- Implementing automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructure
- Monitoring system health, diagnosing, and performing routine maintenance tasks to ensure the reliability of the distributed training infrastructure
- Identifying performance bottlenecks and optimization opportunities
- Improving the developer experience and performance of our scalable ML framework
Your Skills and Experience
- Bachelor's degree in Computer Science, Engineering, or related field, or 4+ years equivalent experience
- Experience building distributed systems for production environments
- Solid Python or C++ skills
- Prior experience with Machine Learning frameworks (e.g., TensorFlow, PyTorch) and distributed training algorithms
Compensation and Benefits
- Competitive salary range: $192,000 - $243,000 USD
- Discretionary annual bonus program
- Equity incentive plan
- Generous Company benefits program
-
Distributed Training Infrastructure Specialist
4 weeks ago
Mountain View, California, United States Waymo Full timeCompany OverviewWaymo is a pioneering autonomous driving technology company dedicated to creating the world's most trusted driver. With its roots in the Google Self-Driving Car Project, Waymo has been working tirelessly since 2009 to build the Waymo Driver, an AI system designed to improve access to mobility while saving countless lives lost to traffic...
-
Distributed Training Systems Specialist
6 days ago
Mountain View, California, United States Waymo Full timeJob DescriptionThis Hybrid role reports to our TLM of Machine Learning Training and involves:Developing the infrastructure components necessary for distributed trainingImplementing automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructureMonitoring system health and performing routine maintenance tasks...
-
Senior Distributed Training Specialist
2 weeks ago
Mountain View, California, United States Waymo Full timeJob SummaryWaymo is looking for a skilled Senior Machine Learning Engineer, Training to join our Hybrid team. In this role, you will develop the infrastructure components necessary for distributed training, implement automation solutions, and monitor system health. If you have experience building distributed systems and working with Machine Learning...
-
Distributed Training Infrastructure Engineer
4 weeks ago
Mountain View, California, United States Waymo Full timeAbout WaymoWaymo is an innovative autonomous driving technology company with a mission to provide the most trusted driver. Our team has been focused on building the Waymo Driver, the world's most experienced driver, to improve access to mobility and save thousands of lives lost to traffic crashes.The Waymo Driver powers our fully autonomous ride-hailing...
-
Distributed Training Systems Engineer
3 weeks ago
Mountain View, California, United States Waymo Full timeAbout the CompanyWaymo is a leader in autonomous driving technology, working to improve access to mobility while saving thousands of lives. Since 2009, we've focused on building the Waymo Driver—the world's most experienced driver—using cutting-edge artificial intelligence and machine learning algorithms.Job SummaryWe're seeking an experienced...
-
Distributed Training Systems Engineer
5 days ago
Mountain View, California, United States Waymo Full timeOverview: At Waymo, we're working towards a future where everyone can get where they need to go without needing a car. We're looking for a skilled Machine Learning Engineer, Training to help us achieve this goal.Key Responsibilities: In this hybrid role, you will report to the Technical Lead Manager of Machine Learning Training. Your primary responsibilities...
-
Distributed Training Solutions Developer
2 weeks ago
Mountain View, California, United States Nuro Full time**Overview**">Nuro is a cutting-edge robotics company that's changing the game with its autonomous driving technology. As a leader in the industry, we're always pushing the boundaries of innovation. Our team is passionate about developing cutting-edge solutions that make a real difference in people's lives.We're currently looking for a skilled Machine...
-
Distributed Systems Optimization Specialist
6 days ago
Mountain View, California, United States Databricks Full timeAbout the JobWe're looking for a talented Distributed Systems Optimization Specialist to join our team at Databricks. In this role, you'll be responsible for optimizing the performance of our data and AI platform, ensuring it meets the needs of our customers.The Impact You'll HaveIdentify performance limitations of our entire stack based on telemetry,...
-
Training Specialist
22 hours ago
Mountain View, California, United States RODGERS CONSULTING SERVICE INC Full timeRodgers Consulting Services is a state-licensed organization offering support to individuals with developmental disabilities, helping them achieve independence and excel in life. We are seeking a Training Specialist to join our team.Job Summary:To provide independent living skills training for adult consumers with developmental disabilities.Implement...
-
Senior Distributed Systems Developer
5 days ago
Mountain View, California, United States Waymo Full timeTaking Autonomous Driving to the Next LevelAt Waymo, we're pushing the boundaries of what's possible with autonomous driving technology. As a Senior Distributed Systems Developer, you'll have the chance to work on high-impact projects that drive innovation and growth.About the Position:Design and develop scalable distributed training infrastructure...
-
Mountain View, California, United States Waymo Full timeJob DescriptionWaymo is an autonomous driving technology company with the mission to become the most trusted driver. We are seeking a skilled Machine Learning Distributed Systems Developer to join our Hybrid team.In this role, you will report to our TLM of Machine Learning Training and work closely with Research and Production teams to develop models in...
-
Distributed Systems Specialist
6 days ago
Mountain View, California, United States Qualified Technical Services Full timeJob Summary:">We are seeking a highly skilled Software Systems Engineer to join our team at NASA Ames Research Center in Mountain View, CA. This role involves developing the core infrastructure for autonomous coordination between spacecraft and ground applications.">About the Project:">The Distributed Spacecraft Autonomy project is centered out of NASA Ames...
-
USDS Training and Development Specialist
4 weeks ago
Mountain View, California, United States Tik Tok Full timeExplore an exciting opportunity at TikTok, a leading platform for short-form mobile video. As a USDS Training and Development Specialist, you will play a critical role in creating and delivering training programs that enhance the skills of content moderators worldwide.About the RoleWe are seeking a seasoned Training and Development Specialist to join our...
-
Clinical Educator and Training Specialist
4 weeks ago
Mountain View, California, United States Alameda Health System Full timeJob SummaryWe are seeking a highly skilled Clinical Educator and Training Specialist to join our team at Alameda Health System. This is an exciting opportunity for a motivated individual to design, implement, and evaluate strategically aligned programs that contribute to the orientation and ongoing competence of staff.The ideal candidate will have a strong...
-
Training Specialist
13 hours ago
Mountain View, California, United States RODGERS CONSULTING SERVICE INC Full timeJob TitleA Service Instructor is required to provide in-home and community-based training for adult consumers with developmental disabilities.About UsRodgers Consulting Services is a state-licensed organization that provides support services to individuals with developmental disabilities, enabling them to live independently and reach their full...
-
Performance Specialist
6 days ago
Mountain View, California, United States Databricks Full timeAt Databricks, we are dedicated to fostering a diverse and inclusive environment where everyone can thrive. As a Performance Specialist, you will have the opportunity to work on complex projects and collaborate with multiple teams across the company.This role requires a strong background in performance analysis, software development, and distributed systems....
-
Machine Learning Infrastructure Engineer
6 days ago
Mountain View, California, United States Waymo Full timeAbout This OpportunityWe're seeking a talented developer to join our ML Infrastructure team as a Distributed Training Specialist. In this role, you will play a key part in developing the infrastructure components necessary for distributed training, including job scheduling, resource management, data distribution, and model synchronization.Your Key...
-
Insurance Solutions Specialist
5 days ago
Mountain View, California, United States Expert Seekers Training Full timeAbout Our Company:Expert Seekers Training is a recognized leader in the industry, known for its Top Company Culture and rapid growth. Our employees consistently rate us highly on platforms like Glassdoor and Indeed.Job Description:You will be responsible for collaborating closely with mentors and operating as part of a cohesive team. This includes engaging...
-
Machine Learning Infrastructure Specialist
5 days ago
Mountain View, California, United States Tik Tok Full timeJob SummaryThe Machine Learning Infrastructure Specialist will be responsible for designing and implementing the infrastructure for TikTok's machine learning models. This role requires expertise in distributed systems, data engineering, and cloud computing.Key ResponsibilitiesDesign and develop scalable data pipelines for machine learning model training and...
-
Cloud System Specialist
6 days ago
Mountain View, California, United States Moveworks Full timeWe are looking for a Cloud System Specialist to join our Core Infrastructure team. As a senior member of this team, you will be responsible for architecting the next generation of the Moveworks AI infrastructure. Your expertise in designing and building scalable, reliable, and resilient foundational services will enable our product to scale seamlessly and...