Reliability Solutions Engineer

4 days ago


Palo Alto, California, United States Luma AI Full time

**Job Overview**

Luma AI is seeking a highly skilled Reliability Solutions Engineer to join our team. As a key member of our Infrastructure and Research teams, you will be responsible for ensuring the health and reliability of our GPU clusters.

We are looking for someone with a strong background in cloud infrastructure, containerization, and programming/scripting languages. Experience with Kubernetes, Terraform, or CloudFormation is a plus.

The ideal candidate will have excellent problem-solving skills, strong communication and collaboration abilities, and a passion for building scalable and fault-tolerant systems.

**Salary:** $180,000 - $250,000 per year (based on location and experience)

**Benefits:*

  • A sizable grant of Luma's equity

Responsibilities:

  • Collaborate with researchers and engineers to specify requirements for GPU infrastructure
  • Work with cloud providers to scale up/down, maintain, and monitor our GPUs
  • Design and implement solutions to ensure scalability and reliability
  • Implement monitoring systems to proactively identify issues
  • Participate in an on-call rotation to respond to critical incidents


  • Palo Alto, California, United States Tesla Full time

    Job OverviewTesla is seeking a talented Mechanical Design Engineer to develop innovative and reliable mechanical solutions for our next-generation computer systems. This role requires strong expertise in mechanical design, analysis, and simulation, as well as excellent communication and collaboration skills.Key ResponsibilitiesDesign and develop mechanical...


  • Palo Alto, California, United States Tesla Full time

    **About the Role:**Tesla is looking for a highly motivated Reliability Engineering Professional to join our team. As a key member of our engineering group, you will play a crucial role in ensuring the reliability of our innovative products.This position offers an exciting opportunity to contribute to the development of cutting-edge technology and shape the...


  • Palo Alto, California, United States Tesla Full time

    Job DescriptionWe are seeking an experienced Electronics Reliability Specialist to join our team at Tesla. As a key member of our energy storage and electronics reliability team, you will play a critical role in enhancing the reliability of our innovative energy solutions.You will be responsible for conducting in-depth failure analysis and investigating the...


  • Palo Alto, California, United States Tesla Full time

    Company OverviewTesla is a leading electric vehicle manufacturer accelerating the world's transition to sustainable energy. Our mission-critical systems enable our engineers to design and develop innovative solutions.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our Design Technology Operations team. This position will be...


  • Palo Alto, California, United States Wing Inflatables, Inc. Full time

    About Wing:We are a technology company pushing the boundaries of drone delivery. Our mission is to create a scalable and sustainable solution for last mile logistics.Our team is dedicated to designing and building highly automated delivery drones, which transport small packages directly from businesses to homes on-demand, in minutes. We operate our aircraft...


  • Palo Alto, California, United States Wing Inflatables, Inc. Full time

    About Wing:We are a pioneer in drone delivery technology, offering a safe, fast, and sustainable solution for last mile logistics. Our mission is to create the preferred means of delivery for the planet by building a workforce that's representative of the global communities we serve.Our Design for Excellence (DFX) team in Palo Alto, California, is seeking a...


  • Palo Alto, California, United States Assured Full time

    About Assured">At Assured, we modernize insurance by providing software solutions to large insurers. We empower them to win in a technology-driven world with self-service claim filing software and backend fraud detection.">Job Overview">We are looking for a Site Reliability Engineer to join our team. The ideal candidate will have experience working in a...

  • Reliability Expert

    1 week ago


    Palo Alto, California, United States Wing Aviation Full time

    About Wing AviationWe're revolutionizing last-mile logistics with drone delivery. Our technology is designed to be easy to integrate into existing networks, offering a scalable solution for businesses worldwide.Job OverviewWe're seeking a Reliability Engineer to join our Hardware Reliability team in Palo Alto, CA.Responsibilities:Define and execute design...


  • Palo Alto, California, United States Plume Full time

    About the JobThe Technical Manager will lead a team of Site Reliability Engineers, providing technical guidance and oversight. Key responsibilities include:Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds.Attend and conduct customer Meetings for Project and Roadmap specification.Manage growth and performance of...


  • Palo Alto, California, United States Tesla Full time

    We are looking for an exceptional Mechanical Reliability Engineer to join our Design for Reliability team at Tesla. As a key member of this team, you will be responsible for designing reliability into the mechanical components and sub-systems of our Tesla Bot.What You'll DoAssess Product Risks and Identify Failure Modes: Work in cross-functional settings to...


  • Palo Alto, California, United States InDepth Engineering Solutions, LLC Full time

    About InDepth Engineering Solutions, LLCWe are a leading provider of cutting-edge autonomy hardware solutions, pushing the boundaries of innovation in the automotive industry. Our team is dedicated to delivering high-quality test software and validation frameworks that ensure the reliability and performance of our SoC solutions.


  • Palo Alto, California, United States Tesla Full time

    Job Description:We are seeking an experienced Electrical Engineer to join our team at Tesla, focusing on designing and developing innovative power electronics solutions for our powertrains and energy products. The ideal candidate will have a strong understanding of power converter topologies, high-voltage power semiconductor selection, magnetics, and...


  • Palo Alto, California, United States Luma AI Full time

    **Job Description**We are seeking a highly skilled AI/ML System Reliability Expert to join our team at Luma AI. As a key member of our Infrastructure and Research teams, you will be responsible for ensuring the health and reliability of our GPU clusters.The ideal candidate will have a strong background in AI/ML system reliability, cloud infrastructure, and...


  • Palo Alto, California, United States Amazon Full time

    OverviewAmazon Advertising is dedicated to driving measurable outcomes for brand advertisers, agencies, authors, and entrepreneurs. Our ad solutions leverage Amazon's innovations and insights to find, attract, and engage intended audiences throughout their daily journeys.Salary RangeThe base pay for this position ranges from $151,300/year in our lowest...


  • Palo Alto, California, United States Plume Full time

    About the CompanyPlume is a leader in the smart home and small business market, delivering services to over 50 million locations globally. Our software-defined network platform allows CSPs to decouple their service offerings from hardware and rapidly curate and deliver new services over a multi-vendor, open-platform architecture.We're looking for a seasoned...


  • Palo Alto, California, United States oilandgas Full time

    Job Description:In this challenging role as a Senior Reliability Specialist, you will play a pivotal part in driving exceptional reliability into Tesla's energy systems.Responsibilities:Develop and communicate reliability targets for site, product, subsystem, and components to ensure seamless integration.Create Fault Trees and reliability block diagrams to...


  • Palo Alto, California, United States Tesla Full time

    Job OverviewIn this role as a Senior Electronics Reliability Engineer, you will play a key part in enhancing the reliability of our innovative Energy and Charging products. You will be responsible for conducting in-depth failure analysis and investigating the underlying mechanisms of electronic failures within our Industrial Energy, Residential Energy,...


  • Palo Alto, California, United States Tesla Full time

    Job SummaryWe are seeking an experienced Industrialization Engineer to join our Energy Electromechanical team at Tesla. In this role, you will be responsible for the industrialization of critical components, ensuring they meet the highest quality and reliability standards.Main ResponsibilitiesIndustrialization Activities: Plan, organize, and direct...


  • Palo Alto, California, United States Tesla Full time

    Role DescriptionThis is a challenging opportunity to work with cutting-edge technology and contribute to the development of automation tools. As a Site Reliability Engineer, you will drive root cause analysis of system failures, manage containerization technology, and maintain site performance using various tools.Expected CompensationThe estimated annual...


  • Palo Alto, California, United States Tesla Full time

    Role OverviewTesla is seeking a highly skilled Cell Reliability Specialist to join our team. As a key member of our engineering group, you will play a crucial role in developing and implementing strategies to ensure the reliability and performance of our battery cells.In this role, you will be responsible for guiding the development of new cell technologies...