Software Engineer, Deep Learning Infrastructure

3 weeks ago


Stanford, United States Tesla Full time

**Software Engineer, Deep Learning Infrastructure - Autopilot**

????Engineering & Information Technology????Palo Alto, California?? ID104044????Full-time **THE ROLE:**

As a Software Engineer within Autopilot, you will work on reinforcing, optimizing, and scaling our neural network training infrastructure.

At the core of our self-driving capabilities, there are different neural networks that the Deep Learning team is designing to train large amounts of data. Robustly training jobs at scale, should it be for production models or quick experiments, and completing them in the shortest amount of time possible, is critical to our mission.

**Responsibilities:**

Write robust Python software code in our machine learning training repository while applying best software practices to support machine learning scientists in tasks such as fetching training data, preprocessing it, and orchestrating the training runs.

Integrate the training software into our continuous integration cluster to support metrics persistence across experiments, weekly/nightly neural network builds, and other unit / throughput tests.

Profile performance of training software in our training cluster, identify bottlenecks in and between CPU/GPU code execution, and work on optimizing its throughput and scalability within and across nodes to ultimately reduce convergence time.

Coordinate with the team managing the hardware cluster to maintain high availability / jobs throughput for Machine Learning.

**Requirements:**

Experience programming in Python and/or C/C++.

Proficient in system-level software, in particular hardware-software interactions and resource utilization.

Understanding of modern machine learning concepts and state of the art deep learning.

Experience working with training frameworks, ideally PyTorch.

Demonstrated experience scaling neural network training jobs across clusters of GPUs.

Optional: Experience programming in Cuda.

Optional: Profiling and optimizing CPU-GPU interactions (pipelining compute/transfers, etc).

Optional: Devops experience, in particular dealing with clusters of training nodes, and filesystems for very large amount of training data.

**?????**

Tesla ?????????????????????????????????????????????????????????????????????????????

Tesla ?????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????

Tesla ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????

**Software Engineer, Deep Learning Infrastructure - Autopilot**

???? Engineering & Information Technology ???? Palo Alto, California ?? ID 104044 ???? Full-time **THE ROLE:**

As a Software Engineer within Autopilot, you will work on reinforcing, optimizing, and scaling our neural network training infrastructure.

At the core of our self-driving capabilities, there are different neural networks that the Deep Learning team is designing to train large amounts of data. Robustly training jobs at scale, should it be for production models or quick experiments, and completing them in the shortest amount of time possible, is critical to our mission.

**Responsibilities:**

Write robust Python software code in our machine learning training repository while applying best software practices to support machine learning scientists in tasks such as fetching training data, preprocessing it, and orchestrating the training runs.

Integrate the training software into our continuous integration cluster to support metrics persistence across experiments, weekly/nightly neural network builds, and other unit / throughput tests.

Profile performance of training software in our training cluster, identify bottlenecks in and between CPU/GPU code execution, and work on optimizing its throughput and scalability within and across nodes to ultimately reduce convergence time.

Coordinate with the team managing the hardware cluster to maintain high availability / jobs throughput for Machine Learning.

**Requirements:**

Experience programming in Python and/or C/C++.

Proficient in system-level software, in particular hardware-software interactions and resource utilization.

Understanding of modern machine learning concepts and state of the art deep learning.

Experience working with training frameworks, ideally PyTorch.

Demonstrated experience scaling neural network training jobs across clusters of GPUs.

Optional: Experience programming in Cuda.

Optional: Profiling and optimizing CPU-GPU interactions (pipelining compute/transfers, etc).

Optional: Devops experience, in particular dealing with clusters of training nodes, and filesystems for very large amount of training data.



  • Stanford, United States Tesla Full time

    **Senior Software Engineer, Deep Learning - Autopilot AI** ????Engineering & Information Technology????Palo Alto, California?? ID111813???? **The Role** As a member of the Autopilot AI team you will design, implement, and optimize deep learning dataset generation, training, and evaluation tools and infrastructure to advance the state of the art in perception...


  • Stanford, United States Tesla Full time

    **Backend Software Engineer, Autopilot Infrastructure (IB)** ????Engineering & Information Technology????Palo Alto, California?? ID106229???? **The Role** As a member of the Autopilot Infrastructure team, you will design and implement a diverse set of backend services and tools that power Autopilot software and processes. The systems you build will have a...


  • Stanford, United States Tesla Full time

    **Full Stack Software Engineer, Autopilot Tooling** ????Engineering & Information Technology????Palo Alto, California?? ID114647????Full-time **THE ROLE:** Tesla's Autopilot Tools team builds apps and services used in the development, debugging, and ongoing validation of the Autopilot software. Autopilot is at the forefront of self-driving, and we are...


  • Stanford, United States Catalytic Data Science Full time

    Job DescriptionJob DescriptionSalary: Highly Competitive - DOEWho You Are REMOTE OPPORTUNITY You are passionate about continuously delivering quality software as well as the craft of software engineering and eager to join a team of life scientists and software engineers that believe the brightest minds in research should have the best tools to leverage...


  • Stanford, United States Diverse Lynx Full time

    Role: AWS Software Engineer Location: Palo Alto(Day 1 Onsite) Only Local Candidate Position IDs: Job Description: Resource must work from Palo Alto location (in office). AWS experience is must have especially on the following services SQS & Postgres Aurora. Java backend experience Diverse Lynx LLC is an Equal Employment Opportunity employer. All...


  • Stanford, United States Tesla Full time

    **Mechanical Systems Engineer, Drive Systems** ????Engineering & Information Technology????Palo Alto, California?? ID116015???? The Drive System Engineering team is looking for a world class Mechanical System Engineer to develop for all drive system models including but not limited to: thermal, lubrication and hydraulic systems. As designs mature and...


  • Stanford, United States eTeam Full time

    Full Stack Software Engineer: Microsoft Chat Accelerator Specialist. Professional Summary: A seasoned Full Stack Software Engineer with over 5 years of comprehensive experience in developing, deploying, and optimizing web applications. Specializes in leveraging Microsoft Azure services, including Azure Cognitive Search and Azure OpenAI, to create...


  • Stanford, United States Geosite Full time

    Job DescriptionJob DescriptionDescriptionThis position is open to US residents and citizens onlyWho We're Seeking We are looking for Senior Cloud Operations Engineer to help (1) build and maintain our cloud infrastructure using modern orchestration tools; and (2) implement cybersecurity best practices in pursuit of our compliance objectives. You will be...

  • CAE Engineer

    3 weeks ago


    Stanford, United States Tesla Full time

    **CAE Engineer - Drive Systems** ????Engineering & Information Technology????Palo Alto, California?? ID113945???? The Drive Systems team designs, optimizes, and engineers world class EV powertrains that push the boundaries of efficiency, performance, and time to market. This can only be done with a deep understanding of engineering first principles and the...

  • Electrical Engineer

    15 hours ago


    Stanford, United States ArrayLabs, LLC Full time

    Array Labs is building a distributed radar imaging constellation to power the first accurate, real-time 3D model of the world. The Array Labs Spacecraft Bus Team oversees the design, analysis, fabrication, and integration of mechanical and electrical systems aboard Array Labs satellites, ensuring their robustness and performance. We are looking for a...


  • Stanford, United States Tesla Full time

    **System Validation Engineer Chassis and Drive Systems** ????Engineering & Information Technology????Palo Alto, California?? ID112796???? **Locations** * Palo Alto, CA * Austin, TX **The Role** Tesla is looking for a highly motivated individual to join the Vehicle Software organizations Systems Validation Team with a focus on chassis and drive systems. It...


  • Stanford, United States ArrayLabs, LLC Full time

    Array Labs is building a distributed radar imaging constellation to power the first accurate, real-time 3D model of the world. We are looking for a collaborative Mechanical Engineer with a specialization in spacecraft structural analysis and thermal modeling to join our Spacecraft Bus Team. This team oversees the design, analysis, fabrication, and...

  • Electrical Engineer

    3 weeks ago


    Stanford, United States ArrayLabs, LLC Full time

    Array Labs is building a distributed radar imaging constellation to power the first accurate, real-time 3D model of the world.The Array Labs Spacecraft Bus Team oversees the design, analysis, fabrication, and integration of mechanical and electrical systems aboard Array Labs satellites, ensuring their robustness and performance. We are looking for a...


  • Stanford, United States ArrayLabs, LLC Full time

    Array Labs is building a distributed radar imaging constellation to power the first accurate, real-time 3D model of the world. We are looking for a collaborative Mechanical Engineer with a specialization in spacecraft structural analysis and thermal modeling to join our Spacecraft Bus Team. This team oversees the design, analysis, fabrication, and...


  • Stanford, United States ArrayLabs, LLC Full time

    Array Labs is building a distributed radar imaging constellation to power the first accurate, real-time 3D model of the world.We are looking for a collaborative Mechanical Engineer with a specialization in spacecraft structural analysis and thermal modeling to join our Spacecraft Bus Team. This team oversees the design, analysis, fabrication, and integration...


  • Stanford, United States Stanford University Full time

    Assess user needs and requirements. - Assist with design and development of applications that may involve sophisticated data manipulation. - Assist with maintaining and updating existing programs. - Troubleshoot and solve basic technical problems. - Software, Associate, Health, Software Engineer, Operations, Research, Technology


  • Stanford, United States ArrayLabs, LLC Full time

    Array Labs is building a distributed radar imaging constellation to power the first accurate, real-time 3D model of the world.The Array Labs Spacecraft Bus Team oversees the design, analysis, fabrication, and integration of mechanical and electrical systems aboard Array Labs satellites, ensuring their robustness and performance. We are looking for a...


  • Stanford, United States ArrayLabs, LLC Full time

    Array Labs is building a distributed radar imaging constellation to power the first accurate, real-time 3D model of the world.We are looking for a collaborative Mechanical Engineer with a specialization in spacecraft structural analysis and thermal modeling to join our Spacecraft Bus Team. This team oversees the design, analysis, fabrication, and integration...


  • Stanford, United States Teleo Inc. Full time

    Teleo is a robotics startup disrupting a trillion-dollar industry. Teleo converts construction heavy equipment, like loaders, dozers, excavators, trucks, etc. into autonomous robots. This technology allows a single operator to efficiently control multiple machines simultaneously, delivering substantial benefits to our customers while significantly enhancing...


  • Stanford, United States ArrayLabs, LLC Full time

    Array Labs is building a distributed radar imaging constellation to power the first accurate, real-time 3D model of the world. The Hardware Engineering team at Array Labs is responsible for the analysis and design of our satellite and ground-station hardware platforms. These platforms tend to be a heterogenous mix of various subsystems like compute, memory,...