Current jobs related to Software Engineer, ML Infrastructure, Dojo - Palo Alto, California - Tesla


  • Palo Alto, California, United States Tesla Full time

    We are seeking a highly skilled Software Engineer to contribute to the development of our Dojo Datacenter Platform.As a key member of our infrastructure team, you will design, develop, and deploy software that ensures the reliability, availability, and scalability of our datacenter operations.You will have a strong focus on network infrastructure and...


  • Palo Alto, California, United States Tesla Full time

    As a Software Engineer at Tesla, you will focus on optimizing and scaling our neural network training and auto-labeling infrastructure for Autopilot and the Humanoid robot. Our autonomy capabilities rely on multiple neural networks that the Deep Learning team designs to train on large amounts of data across GPU clusters and our supercomputer Dojo. Reducing...


  • Palo Alto, California, United States Match Group Full time

    About the RoleWe are seeking a highly skilled Sr. Software Engineer to join our Machine Learning Infrastructure team at Tinder. As a key member of our team, you will be responsible for designing and developing robust and scalable infrastructure to support the diverse needs of machine learning engineers across all Tinder business units.Key...


  • Palo Alto, California, United States Match Group Full time

    About the RoleWe are seeking a highly skilled Sr. Software Engineer to join our Machine Learning Infrastructure team. As a key member of this team, you will be responsible for designing and developing scalable infrastructure to support the diverse needs of machine learning engineers across all Tinder business units.Key Responsibilities* Build robust and...


  • Palo Alto, California, United States Pinterest Full time

    About Pinterest:We're looking for a talented Staff Software Engineer to lead the Ads ML foundation evolution movement, driving 2x Pinterest revenue and 5x ad performance in the next 3 years.This role will involve using cutting-edge ML technologies, including GPU, LLMs, vector search, and data processing systems, to empower 100x bigger models in the next 3...


  • Palo Alto, California, United States Machinify Full time

    About the RoleMachinify is a leading provider of AI-powered software products that transform healthcare claims and payment operations. The company's revolutionary AI-platform has enabled the development and deployment of industry-specific products that increase the speed and accuracy of claims processing by orders of magnitude.We're seeking a Staff Software...


  • Palo Alto, California, United States Tesla Full time

    As a key member of Tesla's Autopilot AI team, you will play a pivotal role in optimizing and scaling our neural network training infrastructure.You will collaborate with a specialized team of machine learning experts and have access to one of the world's largest model training clusters.Your primary focus will be to design, implement, and maintain...

  • Software Engineer

    4 weeks ago


    Palo Alto, California, United States Rubrik Full time

    About The Team:The Rubrik Engineering team is comprised of individuals who produce extraordinary results. Our engineers are driven to build efficient, reliable, and cost-effective products. We believe in empowering our teams, giving engineers autonomy and responsibility, not just tasks. Our goal is to motivate and challenge you to do the best work of your...


  • Palo Alto, California, United States Snapchat Full time

    About the RoleWe are seeking a highly skilled Staff Software Engineer to join our ML Feature Generation Team at Snap Inc. The successful candidate will drive technical direction for the team to accelerate ML iteration speed and improve system performance and efficiency.Key ResponsibilitiesDrive technical direction for the team to accelerate ML iteration...


  • Palo Alto, California, United States Tesla Full time

    Job SummaryAs a Software Engineer within the AI group at Tesla, you will play a critical role in reinforcing, optimizing, and scaling our neural network training and auto-labeling infrastructure for both Autopilot and the Humanoid robot.Key ResponsibilitiesReduce wall clock time to convergence of our training jobs by identifying bottlenecks in the ML stack,...


  • Palo Alto, California, United States Tesla Full time

    Job SummaryWe are seeking a highly skilled Software Engineer to join our Autonomy team at Tesla. As a Software Engineer, you will contribute to the development of our AI inference and runtime stack, working closely with AI Engineers and Hardware Engineers to build the frameworks and infrastructure that enable the seamless deployment, integration, and...


  • Palo Alto, California, United States Woven by Toyota Full time

    Job SummaryWe are seeking a highly skilled Senior Software Engineer to join our Machine Learning Platform team at Woven by Toyota. As a key member of our team, you will be responsible for developing and integrating cutting-edge machine learning methods for efficient, large-scale training of ML models and supporting multi-platform deployment, including...


  • Palo Alto, California, United States Tesla Full time

    Tesla's Dojo team is seeking a highly skilled VLSI engineer to design and integrate SOCs, IP, circuits, tool flows, and methodologies into systems using advanced technologies.This position entails leading large design blocks and SOCs from early design stage to tape out, floorplanning and partitioning designs to meet area, timing, and power requirements, and...

  • AI/ML Leader

    4 weeks ago


    Palo Alto, California, United States AISERA Full time

    Aisera is a leading provider of AI Copilot solutions, utilizing AiseraGPT and Generative AI to facilitate business transformation and drive revenue growth through a self-service model. Aisera's AI Copilot uses industry and domain-specific LLMs to deliver human-like experiences and auto-remediate requests through AI workflows. With 400 integrations and 1200...

  • AI/ML Leader

    4 weeks ago


    Palo Alto, California, United States Aisera Full time

    Aisera is a leading provider of AI Copilot solutions, utilizing AiseraGPT and Generative AI to facilitate business transformation and drive revenue growth through a self-service model. Aisera's AI Copilot uses industry and domain-specific LLMs to deliver human-like experiences and auto-remediate requests through AI workflows. With 400+ integrations and 1200+...

  • AI/ML Director

    1 month ago


    Palo Alto, California, United States TBWA\Chiat\Day Full time

    About AiseraAisera is a leading provider of AI Copilot solutions, utilizing AiseraGPT and Generative AI to facilitate business transformation and drive revenue growth through a self-service model.Aisera's AI Copilot uses industry and domain-specific LLMs to deliver human-like experiences and auto-remediate requests through AI workflows. With 400+...


  • Palo Alto, California, United States Foundry Technologies, Inc. Full time

    About FoundryFoundry Technologies, Inc. is a leading provider of AI infrastructure solutions. We are seeking a highly skilled Senior Infrastructure Reliability Engineer to join our team.Job SummaryWe are looking for a talented engineer to design, deploy, and maintain our AI infrastructure. The ideal candidate will have a strong background in cloud...


  • Palo Alto, California, United States stakefish Full time

    Job Title: DevOps EngineerWe are seeking a highly skilled DevOps Engineer to join our team at stakefish. As a DevOps Engineer, you will play a critical role in building and maintaining our blockchain infrastructure, ensuring the security, scalability, and reliability of our systems.Key Responsibilities:Design and implement secure and reliable infrastructure...


  • Palo Alto, California, United States Guardant Health Full time

    Job DescriptionGuardant Health is a leading precision oncology company focused on helping conquer cancer globally through the use of its proprietary tests, vast data sets, and advanced analytics. The company's HPC team builds and operates the computational technology backbone of the organization, including scalable data storage, high-performance compute...


  • Palo Alto, California, United States Rubrik Full time

    About The TeamThe Rubrik Engineering team is comprised of individuals who produce extraordinary results. Our engineers are driven to build efficient, reliable, and cost-effective products. We believe in empowering our teams, giving engineers autonomy and responsibility, not just tasks. Our goal is to motivate and challenge you to do the best work of your...

Software Engineer, ML Infrastructure, Dojo

1 month ago


Palo Alto, California, United States Tesla Full time

As a Machine Learning Software Engineer within Dojo, you will play a crucial role in bridging the gap between our cutting-edge Dojo training accelerator and the neural networks developed by our Autopilot ML team.

Collaborate closely with world-class ML Researchers, Compiler and Hardware Engineers to tackle unique challenges at the intersection of AI and ML training accelerators.

Your expertise will be instrumental in optimizing and scaling our neural network training infrastructure.



**Responsibilities**

Work with machine learning Researchers and Engineers to run FSD models on our in-house ML training accelerator
Profile performance of training workloads in our cluster, identify bottlenecks in and between CPU/Dojo code execution, and work on optimizing its throughput and scalability within and across nodes to ultimately reduce convergence time
Coordinate with the team managing the hardware cluster to maintain high availability / jobs throughput for Machine Learning
Integrate the training software into our continuous integration cluster to support metrics persistence across experiments, weekly/nightly neural network builds, and other unit / throughput tests
**Requirements**

Degree in Engineering, Computer Science, or equivalent in experience and evidence of exceptional ability
Practical experience programming in Python and/or C++
Experience working with training frameworks, ideally PyTorch
Proficient in system-level software, in particular hardware-software interactions and resource utilization
Understanding of modern machine learning concepts and state of the art deep learning
Profiling and optimizing CPU-accelerator interactions (pipelining compute/transfers, etc.)
Devops experience, in particular dealing with clusters of training nodes, and filesystems for very large amount of training data
**Compensation and Benefits**

Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:

Aetna PPO and HSA plans > 2 medical plan options with $0 payroll deduction
Family-building, fertility, adoption and surrogacy benefits
Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
Company Paid (Health Savings Account) HSA Contribution when enrolled in the High Deductible Aetna medical plan with HSA
Healthcare and Dependent Care Flexible Spending Accounts (FSA)
LGBTQ+ care concierge services
401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
Company paid Basic Life, AD&D, short-term and long-term disability insurance
Employee Assistance Program
Sick and Vacation time (Flex time for salary positions), and Paid Holidays
Back-up childcare and parenting support resources

Voluntary benefits to include:
critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
Weight Loss and Tobacco Cessation Programs
Tesla Babies program
Commuter benefits
Employee discounts and perks program
Expected Compensation

$120,000 - $318,000/annual salary + cash and stock awards + benefits

Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.