Staff HPC Infrastructure Engineer

2 weeks ago


Palo Alto, California, United States Guardant Health Full time
Job Overview

Guardant Health is a leading precision oncology company seeking a highly skilled HPC Infrastructure Specialist to join its team. The successful candidate will be responsible for designing, implementing, and maintaining the company's high-performance computing infrastructure.

The ideal candidate will have a strong background in Linux/Unix administration, knowledge of Unix network protocols, and experience with large-scale data storage and compute clusters. Additionally, they will have a solid understanding of cloud-based data centers and experience with building software release and ops processes.

The HPC Infrastructure Specialist will work closely with the company's HPC team to ensure the smooth operation of the computing infrastructure and will be responsible for troubleshooting and resolving technical issues.

The successful candidate will have a strong passion for engineering excellence and a willingness to learn and adapt to new technologies.

Key Responsibilities:

  • Design and implement high-performance computing infrastructure
  • Maintain and troubleshoot the computing infrastructure
  • Work closely with the HPC team to ensure smooth operation of the infrastructure
  • Collaborate with offsite consultants and vendors to maintain and upgrade the infrastructure
  • Participate in a 24/7 on-call rotation

Requirements:

  • 4+ years of Linux/Unix administration experience
  • Knowledge of Unix network protocols and TCP/IP network fundamentals
  • Experience with large-scale data storage and compute clusters
  • 2+ years of experience working in and with on-premise and cloud-based data centers
  • 3+ years of experience building software release and ops processes
  • 4+ years of providing documentation of system administration

Preferred Skills:

  • Experience administering IBM's General Parallel File System
  • Experience administering Grid Engine scheduler
  • Experience administering SLURM scheduler
  • Experience with using Bright Cluster Manager
  • Experience with cloud bursting technologies
  • Experience with wide area file systems
  • Experience with docker and container technologies
  • Experience with Kubernetes, preferably with Certified Kubernetes Administrator (CKA)
  • Operating infrastructure compliant with HIPAA and SOX standards

Education:

B.S. in Computer Science or related field



  • Palo Alto, California, United States Guardant Health Full time

    Job DescriptionGuardant Health is a leading precision oncology company focused on helping conquer cancer globally through the use of its proprietary tests, vast data sets, and advanced analytics. The company's HPC team builds and operates the computational technology backbone of the organization, including scalable data storage, high-performance compute...


  • Palo Alto, California, United States Guardant Health Full time

    Job Title: Senior HPC Infrastructure EngineerGuardant Health is a leading precision oncology company focused on helping conquer cancer globally through the use of its proprietary tests, vast data sets, and advanced analytics. We are seeking a highly skilled Senior HPC Infrastructure Engineer to join our team and help us build and operate the computational...


  • Palo Alto, California, United States Guardant Health Full time

    Job DescriptionGuardant Health is a leading precision oncology company focused on helping conquer cancer globally through the use of its proprietary tests, vast data sets, and advanced analytics.The company's oncology platform leverages capabilities to drive commercial adoption, improve patient clinical outcomes, and lower healthcare costs across all stages...


  • Palo Alto, California, United States Tesla Full time

    Job Title: HPC Engineer, AI InfrastructureTesla's AI Infrastructure team is responsible for designing and maintaining the high-performance computing systems that power our machine learning algorithms. As an HPC Engineer, you will play a critical role in ensuring the smooth operation of our AI infrastructure, including virtual simulations, Autopilot hardware,...


  • Palo Alto, California, United States Tesla Full time

    About the RoleTesla's AI infrastructure team is seeking a highly skilled HPC Engineer to join our team. As a key member of our team, you will be responsible for maintaining and improving our AI infrastructure to support our Full-Self-Driving (FSD), Tesla Bot & Dojo engineering teams.Key ResponsibilitiesManage and operate our AI infrastructure, including...


  • Palo Alto, California, United States Pinterest Full time

    About PinterestPinterest is a leading online platform where millions of people come to find new ideas and inspiration every day. Our mission is to help people discover and create a life they love. As a Staff Software Engineer on our Ads Delivery Content Infrastructure team, you will play a critical role in helping us achieve this mission.Job SummaryWe are...


  • Palo Alto, California, United States stakefish Full time

    Job Title: DevOps EngineerWe are seeking a highly skilled DevOps Engineer to join our team at stakefish. As a DevOps Engineer, you will play a critical role in building and maintaining our blockchain infrastructure, ensuring the security, scalability, and reliability of our systems.Key Responsibilities:Infrastructure Management: Design, implement, and...


  • Palo Alto, California, United States Obsidian Security Full time

    About Obsidian SecurityWe're a passionate team optimizing for impact by solving some of the biggest challenges in cybersecurity today. Our mission is to provide the industry's most comprehensive and powerful SaaS defense solution.We're committed to solving the challenge of SaaS Security for our customers as efficiently and effectively as possible. Our team...


  • Palo Alto, California, United States Acceler8 Talent Full time

    About the RoleWe are seeking a highly skilled Machine Learning Infrastructure Engineer to join our team. As a key member of our technical staff, you will be responsible for designing, building, and maintaining the infrastructure required for training, deploying, and managing our AI models.Key ResponsibilitiesDesign and implement scalable infrastructure for...


  • Palo Alto, California, United States stakefish Full time

    Job SummaryWe are seeking a highly skilled Blockchain Infrastructure Engineer to join our team at stakefish. As a key member of our infrastructure team, you will be responsible for designing, building, and maintaining our blockchain infrastructure.Your primary focus will be on ensuring the security, scalability, and reliability of our infrastructure, as well...

  • Staff Data Engineer

    1 week ago


    Palo Alto, California, United States Rivian Full time

    About RivianRivian is a pioneering company that's on a mission to keep the world adventurous forever. We're not just building emissions-free Electric Adventure Vehicles, but also attracting curious and courageous individuals who share our passion for innovation and sustainability.Our team is diverse, but we all share a love for the outdoors and a desire to...


  • Palo Alto, California, United States stakefish Full time

    Job Title: DevOps EngineerWe are seeking a highly skilled DevOps Engineer to join our team at stakefish. As a DevOps Engineer, you will play a critical role in building and maintaining our blockchain infrastructure, ensuring the security, scalability, and reliability of our systems.Key Responsibilities:Design and implement secure and reliable infrastructure...


  • Palo Alto, California, United States Foundry Technologies, Inc. Full time

    About FoundryFoundry Technologies, Inc. is a leading provider of AI infrastructure solutions. We are seeking a highly skilled Senior Infrastructure Reliability Engineer to join our team.Job SummaryWe are looking for a talented engineer to design, deploy, and maintain our AI infrastructure. The ideal candidate will have a strong background in cloud...


  • Palo Alto, California, United States stakefish Full time

    Job Title: DevOps EngineerWe are seeking a highly skilled DevOps Engineer to join our team at stakefish. As a DevOps Engineer, you will play a critical role in building and maintaining our blockchain infrastructure, ensuring the security, scalability, and reliability of our systems.Key Responsibilities:Design and implement secure and reliable infrastructure...


  • Palo Alto, California, United States stakefish Full time

    Job OverviewWe are seeking a highly skilled DevOps Engineer to join our team at stakefish. As a key member of our infrastructure team, you will play a critical role in building and maintaining our blockchain networks and protocols.Key ResponsibilitiesDesign and implement secure and reliable infrastructure to monitor, detect, and mitigate performance and...


  • Palo Alto, California, United States Foundry LLC Full time

    About FoundryFoundry is revolutionizing the way AI companies access compute power. Our mission is to orchestrate the world's compute capacity, making it easier to use and optimized for AI workloads. We're building a new type of public cloud—one designed specifically for AI, where accessing high-performance compute is as simple and reliable as flipping a...


  • Palo Alto, California, United States Matroid Full time

    About MatroidMatroid is a pioneering company in the field of computer vision, aiming to empower businesses and industries with its cutting-edge solutions. Founded in 2016 by a Stanford professor, the company has raised $33.5 million from prominent investors and boasts a diverse range of customers and partners in manufacturing, industrial IoT, and...


  • Palo Alto, California, United States Palantir Technologies Full time

    Job DescriptionA World-Changing CompanyPalantir builds the world's leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.The RoleAs a Principal Infrastructure Security...


  • Palo Alto, California, United States Snarkify Full time

    Job DescriptionSnarkify is seeking a highly skilled Senior Blockchain Infrastructure Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, developing, and maintaining scalable proof systems, libraries, and related tools to support Zero-Knowledge Proofs (ZKP) applications.Key Responsibilities:Design and...


  • Palo Alto, California, United States Foundry Technologies, Inc. Full time

    About the RoleWe are seeking a highly skilled Senior Cloud Infrastructure Engineer to join our team at Foundry Technologies, Inc. As a key member of our infrastructure team, you will be responsible for designing, deploying, and maintaining our cloud infrastructure to support our AI workloads.Your primary focus will be on ensuring the reliability,...