Software Engineer for AI Infrastructure Optimization

1 month ago


Palo Alto, California, United States Tesla Full time

As a Software Engineer at Tesla, you will focus on optimizing and scaling our neural network training and auto-labeling infrastructure for Autopilot and the Humanoid robot. Our autonomy capabilities rely on multiple neural networks that the Deep Learning team designs to train on large amounts of data across GPU clusters and our supercomputer Dojo. Reducing training time and increasing efficiency is critical to our mission. Your responsibilities will include identifying bottlenecks in the ML stack, integrating efficient code with the training framework, and optimizing workloads for hardware utilization.

  • Profile our workloads to increase training efficiency and implement solutions to reduce wall clock time to convergence.
  • Develop efficient, low-level code for the overall high-level training framework.
  • Collaborate with the Deep Learning team to optimize workloads for efficient hardware utilization.

Requirements

  • Extensive experience in CUDA kernel programming and pushing GPUs to their limits.
  • Strong knowledge of deep learning frameworks, preferably PyTorch.
  • Proficient in system-level software, hardware-software interactions, and resource utilization.
  • Experience with high-performance networking and Triton.

Benefits

  • Competitive pay and comprehensive benefits package, including medical, dental, vision, and 401(k) plans.
  • Flexible time off and paid holidays.
  • Access to Tesla's Employee Assistance Program and Commuter benefits.


  • Palo Alto, California, United States Inflection AI Full time

    Company OverviewInflection AI is a public benefit corporation leveraging our world-class large language model to build the first AI platform focused on enterprise needs. We are an organization passionate about building innovative solutions, enjoy working together, and strive to hire individuals with diverse backgrounds and experience.We value and support our...


  • Palo Alto, California, United States Foundry Full time

    Foundry LLC is a pioneering company transforming the way AI companies access compute power. Our mission is to orchestrate the world's compute capacity, making it easier to use and optimized for AI workloads. We're building a new type of public cloud designed specifically for AI, where accessing high-performance compute is as simple and reliable as flipping a...


  • Palo Alto, California, United States Tesla Full time

    **Tesla's Autopilot AI Team: A Hub for Machine Learning Excellence**We are seeking a highly skilled **Software Engineer - Model Scaling, Autopilot AI** to join our team at Tesla. As a key member of our Autopilot AI team, you will play a crucial role in optimizing and scaling our neural network training infrastructure.You will work closely with a specialized...


  • Palo Alto, California, United States Inflection AI Full time

    Role OverviewAt Inflection AI, we are building the first AI platform focused on enterprise needs. We are seeking a skilled Machine Learning Software Engineer to join our team.About UsWe are a public benefit corporation leveraging our world-class large language model to drive innovation in the field of artificial intelligence. Our leadership team is comprised...


  • Palo Alto, California, United States Tesla Full time

    Job Summary:Tesla is seeking a talented Cloud Systems Architect to design and implement our AI infrastructure, ensuring seamless operations for Full-Self-Driving (FSD), Tesla Bot & Dojo engineering teams. As a key member of the team, you will play a vital role in managing AI infrastructure, monitoring compute/GPU/network metrics, Linux troubleshooting &...


  • Palo Alto, California, United States Latitude AI LLC Full time

    Latitude AI LLC, an automated driving technology company, is developing a hands-free, eyes-off driver assist system for next-generation Ford vehicles at scale. Our mission is to reimagine what it's like to drive and make travel safer, less stressful, and more enjoyable for everyone.We're seeking a highly skilled Simulation Software Engineer - AI Autonomy to...


  • Palo Alto, California, United States Tesla Full time

    The Tesla AI Infrastructure team is looking for an experienced AI Infrastructure Specialist to join our team. As a key member of the team, you will be responsible for maintaining and improving our AI infrastructure, which includes virtual simulations, Autopilot hardware, silicon design, and Dojo. With the rapidly-growing need for more data and optimized...


  • Palo Alto, California, United States Foundry Technologies, Inc. Full time

    Company OverviewAbout Foundry Technologies, Inc.We are transforming the way AI companies access compute power. Our mission is to orchestrate the world's compute capacity, making it easier to use and optimized for AI workloads. We're building a new type of public cloud-one designed specifically for AI, where accessing high-performance compute is as simple and...

  • Software Engineer

    3 weeks ago


    Palo Alto, California, United States Qualified Health Full time

    Company Overview">Qualified Health is a pioneering healthcare startup leveraging artificial intelligence to revolutionize patient care and outcomes. Our mission is to create cutting-edge solutions that improve lives and push the boundaries of what's possible in healthcare.Salary Range">The salary for this role is estimated to be between $180,000 and $220,000...


  • Palo Alto, California, United States Luma AI Full time

    About the RoleWe're looking for a talented Backend Engineering Lead to drive the development of high-performance data processing systems at Luma AI. This role requires expertise in designing and building scalable infrastructure for large-scale data processing, leveraging thousands of GPUs.Main Responsibilities:Design and develop efficient data processing...


  • Palo Alto, California, United States Tesla Full time

    ResponsibilitiesAs an HPC Engineer, your key responsibilities will be:Spearheading AI/ML cluster infrastructure support on both GPU and Dojo platforms, emphasizing systems automation, configuration management, and large-scale deployment.Enhancing monitoring & self-healing pipelines, along with security protocols.Collaborating with hardware and storage...


  • Palo Alto, California, United States Tesla Full time

    Tesla is seeking an experienced Machine Learning Infrastructure Specialist to join our team. As a key member of our team, you will be responsible for optimizing and scaling our neural network training infrastructure. This is an exciting opportunity to collaborate with world-class ML Researchers and Engineers and contribute to the development of cutting-edge...


  • Palo Alto, California, United States Lutra AI Full time

    About Lutra AILutra AI is a pioneering technology company that empowers individuals to harness the full potential of AI and maximize their personal productivity.We are a tight-knit team based in the San Francisco Bay Area, renowned for our expertise in AI innovation. Are you passionate about learning and applying the latest AI technologies to create...


  • Palo Alto, California, United States Machinify Full time

    Exciting Opportunity at MachinifyMachinify is revolutionizing the healthcare industry with its cutting-edge AI-platform, transforming claims and payment operations. Our innovative solutions have already made a significant impact, saving billions of dollars in mispayments each year.We are seeking a highly skilled Staff Software Engineer, Backend to join our...


  • Palo Alto, California, United States Luma AI Full time

    Luma AI is looking for a skilled Senior Software Engineer to join our applied research team. As a key member of our team, you will design, build, and automate infrastructure for processing large-scale data across multiple clusters of thousands of GPUs. Your expertise in Backend Data Engineering will be crucial in building highly efficient, resilient systems...


  • Palo Alto, California, United States Machinify, Inc. Full time

    About Machinify, Inc.Machinify, Inc. is a leading provider of innovative AI-powered software products that transform healthcare claims and payment operations. With a mission to revolutionize the industry, we empower our customers to increase efficiency, accuracy, and customer satisfaction.We are seeking an experienced Cloud Engineer for AI/ML to join our...


  • Palo Alto, California, United States Machinify, Inc. Full time

    Transforming Healthcare with AIMachinify, Inc. is dedicated to revolutionizing the healthcare industry through innovative AI-powered software products. Our mission is to increase the speed and accuracy of claims processing, reducing waste and improving outcomes for patients, providers, and payers.As a Staff Software Engineer, Backend, you will play a...


  • Palo Alto, California, United States Tesla Full time

    Job DescriptionAs a key member of the AI group at Tesla, you will be responsible for enhancing and optimizing our neural network training infrastructure to support both Autopilot and the Humanoid robot. At the core of our autonomy capabilities are multiple neural networks designed by the Deep Learning team to train on large amounts of data across GPU...


  • Palo Alto, California, United States Foundry Technologies, Inc. Full time

    Foundry Technologies, Inc. is a cutting-edge technology company transforming the way AI companies access compute power. Our mission is to orchestrate the world's compute capacity, making it easier and optimized for AI workloads.We're looking for an experienced security engineer to lead the development of our security infrastructure from the ground up,...


  • Palo Alto, California, United States Tesla Full time

    Company OverviewTesla is a pioneering electric vehicle and clean energy company that is revolutionizing the transportation industry. As a leader in autonomous driving technology, we are seeking highly skilled engineers to join our team.Job SummaryWe are looking for a talented Software Engineer with expertise in performance optimization of AI infrastructure...