Distributed Training Infrastructure Specialist
7 days ago
Company Overview
Waymo is a pioneering autonomous driving technology company dedicated to creating the world's most trusted driver. With its roots in the Google Self-Driving Car Project, Waymo has been working tirelessly since 2009 to build the Waymo Driver, an AI system designed to improve access to mobility while saving countless lives lost to traffic accidents. The Waymo Driver powers Waymo One, a fully autonomous ride-hailing service, and can be applied to various vehicle platforms and product use cases.
The Waymo Driver has logged over one million rider-only trips, achieved by autonomously driving tens of millions of miles on public roads and simulating billions of miles across 13+ U.S. states.
Salary
We offer a competitive salary range of $158,000—$200,000 USD for this full-time position across US locations. Your actual starting pay will be based on job-related factors, including your work location, experience, relevant training, education, and skill level.
Job Description
In this hybrid role, you will report to the Technical Lead Manager of Machine Learning Training and be responsible for developing the infrastructure components necessary for distributed training, including job scheduling, resource management, data distribution, and model synchronization. You will implement automation solutions for provisioning, deployment, monitoring, and scaling of distributed training infrastructure to improve operations and reliability.
You will monitor system health, diagnose and troubleshoot issues, and perform routine maintenance tasks to ensure the reliability of the distributed training infrastructure. Additionally, you will identify performance bottlenecks and optimization opportunities, and improve the developer experience and performance of our scalable ML framework.
Required Skills and Qualifications
- Bachelor's degree in Computer Science, Engineering, or related field, or 2+ years equivalent experience
- Experience with distributed systems principles and experience building distributed systems for production environments
- Solid Python or C++ skills
- Prior experience with Machine Learning frameworks (e.g., TensorFlow, PyTorch) and distributed training algorithms
- Debug complex distributed systems issues
- Experience communicating updates and resolutions to customers and other partners
Benefits
As a Waymo employee, you are eligible to participate in our discretionary annual bonus program, equity incentive plan, and generous Company benefits program, subject to eligibility requirements.
-
Distributed Training Infrastructure Engineer
7 days ago
Mountain View, California, United States Waymo Full timeAbout WaymoWaymo is an innovative autonomous driving technology company with a mission to provide the most trusted driver. Our team has been focused on building the Waymo Driver, the world's most experienced driver, to improve access to mobility and save thousands of lives lost to traffic crashes.The Waymo Driver powers our fully autonomous ride-hailing...
-
Distributed Training Systems Engineer
4 days ago
Mountain View, California, United States Waymo Full timeAbout the CompanyWaymo is a leader in autonomous driving technology, working to improve access to mobility while saving thousands of lives. Since 2009, we've focused on building the Waymo Driver—the world's most experienced driver—using cutting-edge artificial intelligence and machine learning algorithms.Job SummaryWe're seeking an experienced...
-
Reliability Engineer
6 days ago
Mountain View, California, United States ThisWay Full timeJob OverviewThisWay is seeking a highly skilled Reliability Engineer - Infrastructure Specialist in Mountain View, CA. The role involves supporting the design, build, and maintenance of multi-cloud platforms, including cloud-hosted and serverless services.Key Responsibilities:Lead technical initiatives to enhance the reliability of global infrastructure...
-
Network Infrastructure Specialist
6 days ago
Mountain View, California, United States Saxon Global Full timeSaxon Global is seeking a skilled Network Infrastructure Specialist to join our team. As the world's first and only Lakehouse platform in the cloud, we combine the best of data warehouses and data lakes to offer an open and unified platform for data and AI.About Saxon GlobalWe are a leading provider of innovative solutions for businesses looking to leverage...
-
Mountain View, California, United States Waymo Full timeAbout WaymoWaymo is a pioneering autonomous driving technology company dedicated to revolutionizing the way people move. With a mission to be the most trusted driver, we have been at the forefront of this industry since its inception as the Google Self-Driving Car Project in 2009.Our journey has been marked by significant milestones, including providing over...
-
Highly Skilled Infrastructure Specialist
7 days ago
Mountain View, California, United States Red Oak Technologies Full timeRed Oak Technologies, a leading provider of comprehensive resourcing solutions across various industries and sectors, seeks a highly skilled infrastructure specialist to join their team. Based in Mountain View, CA, the ideal candidate will have at least 8 years of experience in infrastructure engineering with expertise in data center operations, server...
-
Senior Infrastructure Technical Specialist
6 days ago
Mountain View, California, United States Intelliswift Full timeJob OverviewWe are seeking a highly skilled and experienced Senior Infrastructure Technical Specialist to join our team at Intelliswift in Mountain View, CA.Key ResponsibilitiesDesign, implement, maintain, and optimize infrastructure systems across multiple platforms including data centers, server hardware, operating systems, virtualization technologies,...
-
Senior Cloud Infrastructure Specialist
5 days ago
Mountain View, California, United States iSoftTek Solutions Inc Full timeJob DescriptionWe are seeking a highly skilled Senior Cloud Infrastructure Specialist to join our team at iSoftTek Solutions Inc. in Mountain View, CA.About the Role:This is a long-term W2 job that requires strong experience in stream/batch processing systems at scale. You will be responsible for designing, implementing, and maintaining large-scale cloud...
-
Machine Learning Infrastructure Developer
6 days ago
Mountain View, California, United States NewsBreak Full timeAbout NewsBreakNewsBreak is revolutionizing the way users interact with local news and their communities by bridging local users, content creators, and businesses.We foster safer, more vibrant, and authentically connected lives through robust collaborations with thousands of local publishers and businesses across the nation.Our MissionWe are redefining the...
-
Mountain View, California, United States CV Library Full timeJob OverviewWe are seeking a highly skilled Cloud Infrastructure Specialist to join our team at CV Library. This is a 12+ month contract opportunity that requires expertise in machine learning infrastructure, cloud platforms, and containerization technologies.Key Responsibilities:Design and implement scalable machine learning infrastructure on Google Cloud...
-
Mountain View, California, United States LinkedIn Full timeRole OverviewAt LinkedIn, we're building a platform that helps professionals achieve more in their careers. As a Systems & Infrastructure Engineering Intern, you will play a key role in building and supporting large-scale systems, and utilizing distributed systems and algorithms to help scale LinkedIn's infrastructure to handle massive growth in membership,...
-
Cloud Infrastructure Developer
6 days ago
Mountain View, California, United States Joyent Full timeAbout the RoleWe are seeking a talented Cloud Infrastructure Developer to join our dynamic DX (Developer Experience) Team at Joyent. As a key member of this team, you will play a crucial part in designing and developing the next generation Kubernetes cloud web application for our cloud-based platform.Key Responsibilities:System Design and Architecture:...
-
Machine Learning Infrastructure Engineer
3 weeks ago
Mountain View, California, United States Nuro Full timeAbout NuroNuro exists to better everyday life through robotics. Founded in 2016, we have developed autonomous driving (AD) technology and commercialized AD applications. Our world-class autonomous driving system combines AD hardware with our generalized AI-first self-driving software. Our system is built to learn and improve through data and is one of the...
-
Mountain View, California, United States iSoftTek Solutions Inc Full timeJob SummaryWe are seeking a highly skilled Senior Cloud Machine Learning Infrastructure Specialist to join our team at iSoftTek Solutions Inc in Mountain View, CA.This long-term W2 opportunity requires strong experience in machine learning infrastructure and cloud platforms such as GCP. Proficiency in programming languages like Python and Java is essential....
-
Infrastructure Systems Lead
6 days ago
Mountain View, California, United States DeepMind Full timeWe're seeking an experienced Infrastructure Engineer to lead and drive technology projects in our workplace. As an Enterprise Engineering Manager, you will be responsible for the planning, maintenance, and security of a diverse range of systems, including Quantum Stornext storage arrays, hypervisors, Active Directory Servers, Linux, Windows, and Mac...
-
Network Infrastructure Specialist
6 days ago
Mountain View, California, United States Woongjin, INC. Full timeJob Description Company OverviewWOONGJIN, Inc. is a rapidly growing team that provides exceptional services to our clients. We are motivated by a strong sense of responsibility and servant leadership. Job SummaryWe are seeking a skilled Network Administrator to join our team. The ideal candidate will have experience in datacenter network operation, data...
-
Senior Map Infrastructure Engineer
7 days ago
Mountain View, California, United States Applied Intuition Full timeAbout the RoleWe are seeking a Senior Engineer to own and maintain our High-Definition (HD) maps infrastructure. Our product suite utilizes HD maps to solve customer needs in various applications, including localized information calculation, global map querying, and data visualization. This role offers an opportunity to shape the future of our maps by...
-
Senior Machine Learning Engineer, Training
3 weeks ago
Mountain View, California, United States Waymo Full timeAbout the RoleJob SummaryWaymo is an autonomous driving technology company with a mission to be the most trusted driver. As a member of the Machine Learning Infrastructure team, you will work closely with Research and Production teams to develop models in Perception and Planning that are core to our autonomous driving software.About the TeamThe team focuses...
-
Mountain View, California, United States Turnblock Full timeAbout the RoleWe are seeking a highly skilled Ethereum Developer to join our remote team in the US. This is an excellent opportunity to be part of a cutting-edge technology project and contribute to the development of a Blockchain Distribution Network (BDN).Key ResponsibilitiesDesign, implement, test and deploy new features in short cycles to an always-on...
-
Mountain View, California, United States ID Full timeID.me is a pioneering enterprise software company that simplifies identity verification and sharing online.We empower individuals to control their data through a secure, portable login, eliminating the need for multiple passwords across websites.Our digital identity network boasts 117 million registered members, used by 14 federal agencies, 30 states, and...