Machine Learning Infrastructure Specialist

2 days ago


Palo Alto, California, United States Tesla Full time

As a member of Tesla's cutting-edge team, you will play a pivotal role in optimizing and scaling our neural network training infrastructure. You will collaborate closely with world-class ML Researchers and Engineers to tackle unique challenges at the intersection of AI and ML training accelerators.

Key Responsibilities
  • Work with machine learning Researchers and Engineers to run FSD models on our in-house ML training accelerator.
  • Profile performance of training workloads in our cluster, identify bottlenecks in and between CPU/Dojo code execution, and work on optimizing its throughput and scalability within and across nodes to ultimately reduce convergence time.
  • Coordinate with the team managing the hardware cluster to maintain high availability/jobs throughput for Machine Learning.
  • Integrate the training software into our continuous integration cluster to support metrics persistence across experiments, weekly/nightly neural network builds, and other unit/throughput tests.
Requirements
  • Degree in Engineering, Computer Science, or equivalent in experience and evidence of exceptional ability.
  • Practical experience programming in Python and/or C++.
  • Experience working with training frameworks, ideally PyTorch.
  • Proficient in system-level software, in particular hardware-software interactions and resource utilization.
  • Understanding of modern machine learning concepts and state-of-the-art deep learning.
  • Profiling and optimizing CPU-accelerator interactions (pipelining compute/transfers, etc.).
  • Devops experience, in particular dealing with clusters of training nodes, and filesystems for very large amounts of training data.
Compensation and Benefits

Tesla offers a competitive salary range of $120,000 - $318,000 per annum, plus cash and stock awards, and comprehensive benefits including medical, dental, and vision plans, 401(k) with employer match, Employee Stock Purchase Plans, and more.



  • Palo Alto, California, United States Tesla Full time

    **Accelerate Innovation with Tesla's Autopilot AI Team**We are seeking a highly skilled **Software Engineer - Model Scaling, Autopilot AI** to join our team at Tesla. As a key member of our Autopilot AI team, you will play a crucial role in optimizing and scaling our neural network training infrastructure.You will work closely with a specialized team of...


  • Palo Alto, California, United States FORDER I.T. Full time

    Job Title: Machine Learning Infrastructure ArchitectWe are seeking an experienced Machine Learning Infrastructure Architect to join our team at FORDER I.T. This role involves designing, optimizing, and scaling ML infrastructure to drive advancements in our advertising technology.About the Role:The successful candidate will have a strong background in machine...


  • Palo Alto, California, United States Match Group Full time

    About the OpportunityWe are seeking a highly skilled Sr. Software Engineer, Machine Learning Infrastructure to join our team at Match Group. This role will be responsible for building and maintaining scalable infrastructure to support machine learning engineers across various business units.The TeamYou will work closely with cross-functional teams of...


  • Palo Alto, California, United States AiDash Full time

    About AiDashAiDash is a climate-tech company making critical infrastructure industries climate-resilient and sustainable with satellites and AI. Our full-stack SaaS solutions help customers in electric, gas, and water utilities, transportation, and construction transform asset inspection and maintenance while complying with biodiversity net gain mandates and...


  • Palo Alto, California, United States Lanai Full time

    The RoleWe're looking for an ML and Data Science Engineer to help build the world's best enterprise AI platform that enables humans to do the extraordinary. You'll be working on exciting challenges such as LLM applications, Natural Language Understanding (NLU), domain adaptation, question answering, semantic search, and many more.Your expertise will be...


  • Palo Alto, California, United States Tesla Full time

    As a key member of the Tesla Bot team, you will play a vital role in developing and implementing cutting-edge machine learning infrastructure. This position offers an exceptional opportunity to design, build, and deploy scalable solutions for neural network architecture, data visualization, model export, and deployment.ResponsibilitiesDesign and improve...


  • Palo Alto, California, United States Match Group Full time

    About UsTinder, part of the Match Group, revolutionized how people meet and connect. Founded in 2012, our rapid growth demonstrates our ability to fulfill a fundamental human need: real connection. With over 630 million downloads and 97 billion matches, we serve approximately 50 million users per month in 190 countries and 45+ languages.We are looking for a...


  • Palo Alto, California, United States Qualified Health Full time

    Qualified Health is seeking an experienced MLOps Engineer to join our team and play a key role in designing, implementing, and maintaining infrastructure for deploying and managing advanced gen-AI agents and workflows powered by large language models.">About the Role">This position requires collaboration with data scientists and engineers to translate...


  • Palo Alto, California, United States Amazon Full time

    Your Responsibilities:You will design, develop, and deploy ML data infrastructure for search ranking at Amazon scale. You will work alongside systems engineers, machine learning scientists, and data analysts to build and deploy ML data infrastructure that meets the needs of various search teams. You will distill project requirements into coherent projects,...


  • Palo Alto, California, United States Tencent Full time

    At Tencent, we are seeking a Machine Learning Specialist to join our gaming industry team. In this role, you will apply your skills and expertise to collect, analyze, and interpret data to provide insights and solutions to business problems.ResponsibilitiesApplying machine learning algorithms to analyze and interpret data to inform business...


  • Palo Alto, California, United States Motion Recruitment Full time

    We're seeking a Senior Machine Learning Optimization Specialist to join our dynamic team at Motion Recruitment. As an expert in data science and pricing, you'll build cutting-edge models to analyze shopper behavior and optimize pricing strategies.About the RoleThis is a remote role within a Grocery Delivery organization, where you'll tackle complex...


  • Palo Alto, California, United States Amazon Full time

    The estimated annual salary for this position ranges from $129,300 in our lowest geographic market up to $223,600 in our highest geographic market. Compensation is based on a number of factors, including market location and may vary depending on job-related knowledge, skills, and experience. At Amazon, we are committed to hiring the best talent and offering...


  • Palo Alto, California, United States Rivian Full time

    Rivian is a pioneering company that's on a mission to keep the world adventurous forever. This encompasses our Electric Adventure Vehicles and the passionate individuals we seek to attract.The Platform Architecture team is where innovation thrives, pushing the boundaries of what's possible. As a Sr. Staff Machine Learning Engineer, you'll collaborate closely...


  • Palo Alto, California, United States Rivian Full time

    About RivianRivian is on a mission to keep the world adventurous and environmentally sustainable. The company specializes in designing and manufacturing electric adventure vehicles, showcasing its commitment to innovation and customer satisfaction.Compensation and BenefitsThe estimated annual salary for this position in California ranges from $206,500 to...


  • Palo Alto, California, United States Amazon Full time

    About the Role: We are seeking an experienced Machine Learning Engineer to join our Buyer Risk Management team. The ideal candidate will have a strong background in machine learning, programming languages such as Java or Python, and experience with Unix/Linux systems.Responsibilities:• Design and develop cutting-edge machine learning algorithms and systems...


  • Palo Alto, California, United States Machinify, Inc. Full time

    Job OverviewWe are seeking an experienced Staff Software Engineer, Backend to join our team at Machinify, Inc. As a key member of our engineering team, you will work closely with cross-functional teams to develop and maintain our AI-powered software products.The ideal candidate will have extensive experience in web application programming, specifically with...


  • Palo Alto, California, United States Snap Full time

    Company OverviewSnap Inc is a technology company that believes the camera presents the greatest opportunity to improve the way people live and communicate. The Company's three core products are Snapchat, an app that enhances your relationships with friends, family, and the world; Lens Studio, an augmented reality platform that powers AR across services; and...


  • Palo Alto, California, United States Salesforce Full time

    Job Title: Machine Learning EngineerWe're seeking a highly skilled Machine Learning Engineer to join our team at Salesforce. As a Machine Learning Engineer, you'll be responsible for designing, developing, and deploying machine learning models and algorithms that can be applied to real-world problems.About Us:Salesforce is a cloud-based software company that...


  • Palo Alto, California, United States Amazon Full time

    About the RoleWe are seeking an experienced Data Scientist to join our Search Science Data Infrastructure team at Amazon. As a Data Scientist, you will play a key role in designing, developing, and deploying big data and machine learning services that power our search engine.Key ResponsibilitiesYou will lead the development of services and infrastructure at...


  • Palo Alto, California, United States Amazon Full time

    Job DescriptionAs a Machine Learning Architect, you will play a critical role in driving the understanding of the development of automation techniques for machine learning and data science. You will handle Amazon-scale use cases with significant impact on our customers' experiences.You will work with cross-functional teams to identify and solve complex...