Principal Software Engineer, ML Infrastructure

2 weeks ago


Foster City, CA, United States Zoox Full time

Zoox is on a mission to reimagine transportation and ground-up build autonomous robotaxis that are safe, reliable, clean, and enjoyable for everyone. We are still in the early stages of deploying our robotaxis, and it's a great time to join Zoox and make a significant impact on executing this mission. The ML Infrastructure team at Zoox plays a crucial role in enabling innovations in ML and CV and making autonomous driving as seamless as possible.

The Opportunity

We are seeking a deeply technical, influential, and hands-on Principal Software Engineer to shape and build our next-generation ML Infrastructure and significantly reduce the time to develop and deploy large-scale ML and Foundational models to our robotaxi. You will lead the design and development of our Data, Compute, Model Training, and Serving Infrastructure. You will work across all AI teams within Zoox, including Perception, Prediction, Planner, Simulation, Collision Avoidance, and have the opportunity to significantly push the boundaries of how ML is practiced within Zoox.

We build and operate the data infrastructure responsible for ingesting PBs of sensor data and the systems used to assemble training datasets. We operate the compute infrastructure that powers Zoox's model training, serving, and large-scale validation pipelines across tens of thousands of GPUs. We also operate the base layer of ML tools, deep learning frameworks, and inference systems used by our applied research teams for in- and off-vehicle ML use cases. You will lead a team of strong software engineers and act as a force multiplier for our teams. You can learn more about our ML Infrastructure here and our stack behind autonomous driving here.

In this role, you will:

    • Vision: Develop and execute a strategic vision for ML Infrastructure that will unlock innovation in autonomous driving and enhance our rider experience.
    • Technical acumen: Lead the design and implementation of cutting-edge infrastructure spanning all stages of an ML lifecycle from data preparation to training to evaluation, deployment, and serving.
    • Partnership: Collaborate closely with cross-functional teams, including ML researchers, software engineers, data engineers, and hardware engineers, to define requirements and align on architectural decisions.
    • Mentorship: Enable the engineers in the team to grow their careers by providing technical guidance and mentorship.
Qualifications
    • Experience building and managing large-scale ML infrastructure that powers the development of large-scale ML models
    • Excellent leadership skills with a demonstrated ability to lead high-performing engineering teams.
    • Strong experience with training frameworks like PyTorch, JAX, etc., leveraging GPUs efficiently for distributed model training.
    • Experience with GPU-accelerated inference using TensorRT, Ray Serve, or similar frameworks.
    • Proficient in Python and/or C++.
Bonus Qualifications
    • Experience enabling the development and deployment of large-scale Foundation models.
    • Experience working on large-scale data infrastructure and big data processing frameworks like Apache Spark.
    • Experience working in the AV domain supporting Perception, Prediction, Planner et al.


$323,997 - $470,000 a year

Base Salary Range

There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. A sign-on bonus may be offered as part of the compensation package. The listed range applies only to the base salary. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. The salary range listed in this posting is representative of the range of levels Zoox is considering for this position.

Zoox also offers a comprehensive package of benefits, including paid time off (e.g. sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance.

About Zoox

Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We're looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.

Follow us on LinkedIn

Accommodations

If you need an accommodation to participate in the application or interview process please reach out to [email protected] or your assigned recruiter.

A Final Note:

You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

  • Foster City, CA, United States PTC Full time

    Our world is transforming, and PTC is leading the way.Our software brings the physical and digital worlds together, enabling companies to improve operations, create better products, and empower people in all aspects of their business. Our people make all the difference in our success. Today, we are a global team of nearly 7,000 and our main objective is to...


  • Foster City, CA, United States PTC Full time

    Our world is transforming, and PTC is leading the way.Our software brings the physical and digital worlds together, enabling companies to improve operations, create better products, and empower people in all aspects of their business. Our people make all the difference in our success. Today, we are a global team of nearly 7,000 and our main objective is to...


  • Foster City, CA, United States PTC Full time

    Our world is transforming, and PTC is leading the way.Our software brings the physical and digital worlds together, enabling companies to improve operations, create better products, and empower people in all aspects of their business. Our people make all the difference in our success. Today, we are a global team of nearly 7,000 and our main objective is to...


  • Foster City, CA, United States PTC Full time

    Our world is transforming, and PTC is leading the way.Our software brings the physical and digital worlds together, enabling companies to improve operations, create better products, and empower people in all aspects of their business. Our people make all the difference in our success. Today, we are a global team of nearly 7,000 and our main objective is to...


  • Redwood City, CA, United States Datology Full time

    Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy. At DatologyAI, weve built a state of the art data curation suite to automatically curate and optimize petabytes of data to create the best possible...


  • Redwood City, CA, United States Datology Full time

    Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy. At DatologyAI, weve built a state of the art data curation suite to automatically curate and optimize petabytes of data to create the best possible...


  • Redwood City, CA, United States Datology Full time

    Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy. At DatologyAI, weve built a state of the art data curation suite to automatically curate and optimize petabytes of data to create the best possible...


  • Redwood City, CA, United States Datology Full time

    Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy. At DatologyAI, weve built a state of the art data curation suite to automatically curate and optimize petabytes of data to create the best possible...


  • Redwood City, CA, United States Datology Full time

    Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy. At DatologyAI, weve built a state of the art data curation suite to automatically curate and optimize petabytes of data to create the best possible...


  • Redwood City, CA, United States Datology Full time

    Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy. At DatologyAI, weve built a state of the art data curation suite to automatically curate and optimize petabytes of data to create the best possible...