High-Performance Computing Professional

3 weeks ago


Palo Alto, California, United States Tesla Full time

As a Supercomputing Engineer, you will play a key role in maintaining and improving Tesla's AI infrastructure, ensuring our Full-Self-Driving (FSD), Tesla Bot & Dojo engineering teams have the necessary tools and resources to be productive. This includes managing/operating our AI infrastructure, monitoring compute/GPU/network metrics, Linux troubleshooting & performance tuning, and collaborating with our Data Center team to coordinate the smooth operation of hundreds of servers.

  • Main Responsibilities:
  • Support the AI/ML cluster infrastructure on both GPU and Dojo platforms, focusing on systems automation, configuration management and deployment at scale
  • Improve our monitoring & self-healing pipelines, as well as security posture
  • Work with hardware and storage vendors to tune and optimize our server, storage and network performance
  • Performance tuning & OS provisioning on Linux systems
  • Manage HPC clusters, workloads and applications
  • Automation and systems engineering
  • Participate in 24x7 on-call rotation
Requirements:
  • Proficiency with scripting languages such as Python or Bash
  • Proficiency with Linux & network fundamentals
  • Experience with configuration management software (Ansible, etc.), systems monitoring & alerting (Prometheus, Grafana, Telegraf, Splunk, etc.) is a plus
  • Experience with high-throughput low-latency networks, GPU-based computing systems, and/or high performance storage systems is a plus
  • Experience with Slurm, LSF and storage management of parallel file systems is a plus
  • Bachelor's Degree in Computer Science, Computer Engineering, Electrical Engineering, Physics or proof of exceptional skills in related field
  • 3+ years of additional equivalent experience or evidence of exceptional ability related to the position

Our highly competitive salary range starts at $120,000 per year, with benefits including comprehensive health insurance, retirement plans, and more.



  • Palo Alto, California, United States Criteo Full time

    Criteo, a leader in commerce marketing, is building the highest performing and open commerce marketing ecosystem to drive profits and sales for retailers and brands.We are looking for a High-Performance Computing Specialist to join our team. As a key member of our engineering organization, you will play a critical role in designing, implementing, and...


  • Palo Alto, California, United States Foundry Technologies, Inc. Full time

    Foundry Technologies, Inc. is seeking a High-Performance Computing Engineer to join our team in Palo Alto, California. As an Infrastructure Engineer, you will collaborate closely with our development team to architect, build, and deploy cutting-edge infrastructure solutions. We offer a competitive salary range of $170,000 - $230,000 per year, depending on...


  • Palo Alto, California, United States SambaNova Systems Full time

    Key Responsibilities:Design and implement new features for our runtime/embedded OS stack to support high-performance ML training applicationsWork on system software support for the next generation RDU systemProvide tools and performance profilers for customers to configure and use the Datascale systemQualifications:Bachelor's degree in Computer Science,...


  • Palo Alto, California, United States Tesla Full time

    Job DescriptionWe are seeking a highly skilled HPC Engineer to join our Supercomputing/AI infrastructure team. In this role, you will be responsible for maintaining and improving our AI infrastructure platform. This includes managing/operating our AI infrastructure, monitoring compute/GPU/network metrics, Linux troubleshooting & performance tuning, and...


  • Palo Alto, California, United States criteo Full time

    About the RoleAs a High-Performance Computing Engineer at Criteo, you will play a key role in designing and developing software that automates traditional system administration tasks. Our team works on building state-of-the-art technologies to manage billions of ad impressions every day.ResponsibilitiesDesign and develop scalable software systems using...


  • Palo Alto, California, United States Criteo Full time

    About the Role">We are looking for a skilled Senior Network Engineer to join our global infrastructure team at Criteo.Criteo is a leader in commerce marketing, driving profit and sales for retailers and brands through its high-performing commerce marketing ecosystem.The ideal candidate will have a strong background in datacenter operations, WAN management,...


  • Palo Alto, California, United States PsiQuantum Full time

    Join Our TeamPsiQuantum is dedicated to fostering a culture of innovation and excellence. We are committed to delivering cutting-edge solutions in quantum computing and empowering our employees to succeed.Key Skills and QualificationsExpertise in quantum computing and high-performance systems.Experience in software development, particularly in scientific...


  • Palo Alto, California, United States Tesla Full time

    We are looking for a high-performance computing software engineer to join our AI team at Tesla. In this role, you will be responsible for developing and maintaining efficient software for neural network training. You will work closely with cross-functional teams to identify areas for improvement and implement performance optimization techniques to reduce...

  • Senior SRE Engineer

    2 weeks ago


    Palo Alto, California, United States Luma AI Full time

    Join Our TeamLuma AI is a fast-paced, rapidly scaling company that requires experienced professionals like you. As a Senior SRE Engineer - High-Performance Computing, you will collaborate with researchers and engineers to specify availability, performance, correctness, and efficiency requirements of our GPU infrastructure.


  • Palo Alto, California, United States SambaNova Systems Full time

    In this exciting role as a Senior Software Engineer, you will contribute to the development of innovative system software solutions for AI and machine learning applications in high-performance distributed systems. At SambaNova Systems, we value expertise in software engineering, particularly in areas like performance optimization, scalability, and...


  • Palo Alto, California, United States Criteo Full time

    We are looking for an exceptional High Performance Developer to join our team in Palo Alto, California. As a key member of our platform team, you will be responsible for designing and developing high-quality, maintainable code that meets the needs of our business.You will work closely with our cross-functional teams to ensure seamless integration and...


  • Palo Alto, California, United States Palantir Technologies Full time

    Palantir Technologies OverviewWe're a data analytics company that helps organizations make better decisions by bringing the right data to the people who need it. Our platforms are used by partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. We're looking for talented engineers to join our team and...


  • Palo Alto, California, United States Broadcom Corporation Full time

    Job DescriptionBroadcom Corporation is a global technology leader that designs, develops, and supplies a broad range of semiconductor and infrastructure software solutions. We are seeking an experienced Principal Software Engineer to join our vMotion team and contribute to the development of our flagship feature.The successful applicant will have a strong...


  • Palo Alto, California, United States Clockwork Inc Full time

    Company OverviewClockwork Inc is a pioneering startup in Silicon Valley, revolutionizing computer networking and distributed systems. Founded in 2018 by a group of researchers from Stanford University, our high-precision network clock synchronization system delivers up to nanosecond accuracy at scale, powering mission-critical enterprise applications in...


  • Palo Alto, California, United States Tesla Motors Full time

    Job DescriptionWe are seeking a highly skilled Network Routing Chip Engineer to join our team. The ideal candidate will have a strong background in developing C and Python codes for generating routing tables, as well as testing and validating the functionality and performance of routing algorithms and hardware health. In this role, you will be responsible...


  • Palo Alto, California, United States Clockwork Inc Full time

    **About Us**Clockwork Inc. is a renowned startup in Silicon Valley, focused on revolutionizing computer networking and distributed systems.We are seeking a highly skilled High-Performance System Developer to contribute to the design and build of our next-generation time-sensitive applications.In this role, you will utilize your expertise in data structures,...


  • Palo Alto, California, United States Tesla Full time

    About the RoleWe are looking for a talented High-Performance IC Package Designer to join our team at Tesla. As a High-Performance IC Package Designer, you will be responsible for designing IC packages for high-performance computing projects, including Self-Driving Hardware and Dojo Super AI Computer.You will work closely with IC package process, SI/PI,...


  • Palo Alto, California, United States Rubrik Full time

    About the RoleWe are seeking a highly skilled High-Performance Software Systems Engineer to join our team at Rubrik. In this role, you will take full ownership of projects from design to implementation, test and deployment.Your primary focus will be on designing, developing, and delivering hardware and OS abstraction for Rubrik CDM software services. You...


  • Palo Alto, California, United States Gitty Inc. Full time

    About Gitty Inc.Gitty Inc. is a leading provider of innovative software solutions, based in Palo Alto, CA. We're driven by a passion for delivering exceptional results and exceeding customer expectations.Job OverviewWe're seeking a highly skilled Java Software Engineer to join our backend development team. In this role, you'll design, develop, and maintain...


  • Palo Alto, California, United States Criteo Full time

    About the OpportunityWe are looking for a highly experienced Staff Software Engineer to join our team in Palo Alto. As a key member of the platform team, you will be responsible for designing and developing large-scale, distributed systems that meet the highest standards of performance and scalability.Key ResponsibilitiesDevelop high-quality, maintainable...