Staff HPC Infrastructure Engineer

4 weeks ago


Palo Alto California, United States Guardant Health Full time
Job Description

About the Role:

Guardant’s HPC team builds and operates the computational technology backbone of the company. 

This includes scalable data storage that holds petabytes of genomics data, high-performance compute clusters running a custom bioinformatics pipeline in production and R&D environments, and the software infrastructure that hosts an ecosystem of services for internal data processing and external data integration. To facilitate Guardant Health’s fast growth in the next few years, the HPC team is looking for a strong technical engineer who can help maintain and help grow the HPC infrastructure during its aggressive expansion, while working with corporate IT, SQA, and DevOps/SRE teams.  

While preferred to have someone local to the San Francisco Bay and on premises in Redwood City and Palo Alto, this role can be mostly worked remotely.  While on rotation, during maintenances and during cluster deployments, being present at the location of the work is required. 

Essential Duties and Responsibilities:

Act as a technical lead in day to day operations

Help manage the HPC interconnects

Help integrate the HPC systems with the bandwidth on-demand system

Help integrate the HPC system with the single namespace storage system

Help integrate cloud bursting as part of the HPC abstraction work

Work with the networking infrastructure team to manage and optimize the connectivity to and from the HPC systems and locales

Help manage multiple HPC clusters and cluster file systems. 

Help research, develop and implement the next generation HPC solution

Troubleshoot the production system stack down to source code level e.g. shell scripts, python and others.

Maintain, monitor, and support the infrastructure environment and/or facilities.

Use and maintain enhanced production monitoring and additional capability.

Support improvements for increased system reliability and performance.

Support multiple systems or applications of medium to high complex (complexity defined by size, technology used, and system feeds and interfaces) with multiple concurrent users, ensuring control, integrity, and accessibility.

Support systems at remote locations, including internationally

Work with offsite consultants to maintain the infrastructure

Work with vendors to troubleshoot, upgrade and repair systems as needed

Participate in a 24/7 on-call rotation



  • California, United States RedLine Performance Solutions LLC Full time

    RedLine Performance Solutions (RedLine) has been in the HPC solutions engineering services business for over 25 years and is consistently determined to keep the "bar of excellence" quite high for new hires. This enables RedLine to accomplish what other firms cannot and promotes a high level of staff retention. We offer services ranging from full life cycle...


  • Palo Alto, United States Plume Design Inc Full time

    Life at Plume At Plume, we believe that technology isn't about moving faster, it's about making lifes moments better. Which is why weve built the world's first, and only, open and hardware-independent service delivery platform for smart homes, small businesses, enterprises, and beyond. Our SaaS platform uses WiFi, advanced AI, and machine learning to create...


  • Palo Alto, United States Matroid Full time

    About Matroid Matroid's mission is to enable computers to visually understand the world. With a “no programming required” product, Matroid empowers businesses and industry with our computer vision solutions. Our users can deploy cutting-edge, deep neural networks on the cloud or on-premise with the click of a button. Founded in 2016 by a Stanford...


  • Palo Alto, United States Matroid Full time

    About Matroid Matroid's mission is to enable computers to visually understand the world. With a “no programming required” product, Matroid empowers businesses and industry with our computer vision solutions. Our users can deploy cutting-edge, deep neural networks on the cloud or on-premise with the click of a button. Founded in 2016 by a Stanford...


  • Palo Alto, United States Matroid Full time

    About Matroid Matroid's mission is to enable computers to visually understand the world. With a “no programming required” product, Matroid empowers businesses and industry with our computer vision solutions. Our users can deploy cutting-edge, deep neural networks on the cloud or on-premise with the click of a button. Founded in 2016 by a Stanford...


  • Palo Alto, United States Matroid Full time

    About Matroid Matroid's mission is to enable computers to visually understand the world. With a “no programming required” product, Matroid empowers businesses and industry with our computer vision solutions. Our users can deploy cutting-edge, deep neural networks on the cloud or on-premise with the click of a button. Founded in 2016 by a Stanford...


  • Palo Alto, United States Tencent Americas Full time

    Position OverviewTencent Overseas IT has the mission to empower Tencent’s rapid global growth with future-ready, global IT platforms, applications, and services. We are chartered to lead the Overseas IT strategy, architecture, roadmap, and execution. Satisfying our internal and external customers and becoming a world-class global IT team are our top...


  • Palo Alto, United States Tencent Americas Full time

    Position OverviewTencent Overseas IT has the mission to empower Tencent’s rapid global growth with future-ready, global IT platforms, applications, and services. We are chartered to lead the Overseas IT strategy, architecture, roadmap, and execution. Satisfying our internal and external customers and becoming a world-class global IT team are our top...


  • Palo Alto, United States Tencent Americas Full time

    Position OverviewTencent Overseas IT has the mission to empower Tencent’s rapid global growth with future-ready, global IT platforms, applications, and services. We are chartered to lead the Overseas IT strategy, architecture, roadmap, and execution. Satisfying our internal and external customers and becoming a world-class global IT team are our top...


  • Palo Alto, United States Fathom Full time

    Fathom is on a mission to use AI to understand and structure the world’s medical data, starting by making sense of the terabytes of clinician notes contained within the electronic health records of the world’s largest health systems. Our deep learning engine automates the translation of patient records into the billing codes used for healthcare provider...


  • California, United States ASRC Federal Holding Company Full time

    ASRC Federal, InuTeq proudly supports NASA's High Performance Computing Services program with our site in Mountain View, CA at the Ames Research Center. Make a DIFFERENCE on a program that supports 4 On-site Supercomputers totaling 18,000 nodes and 17+ combined petaflops. Our program provides High Performance Computing services throughout the HPC lifecycle...


  • California, United States ASRC Federal Holding Company Full time

    ASRC Federal, InuTeq proudly supports NASA's High Performance Computing Services program with our site in Mountain View, CA at the Ames Research Center. Make a DIFFERENCE on a program that supports 4 On-site Supercomputers totaling 18,000 nodes and 17+ combined petaflops. Our program provides High Performance Computing services throughout the HPC lifecycle...


  • Palo Alto, United States Booz Allen Hamilton Full time

    Network Infrastructure Engineer page is loaded Network Infrastructure Engineer Apply locations San Diego, CA time type Full time posted on Posted 3 Days Ago job requisition id R0194611 Network Infrastructure Engineer The Opportunity: Are you looking for an opportunity to combine your technical skills with big picture thinking to make an impact in the Navy's...

  • Staff Engineer

    2 weeks ago


    Palo Alto, United States Stryker Full time

    Vocera now part of Stryker is Looking for a Senior DevOps Engineer. You will play a key role as part of our DevOps team, developing primarily on the AWS and Azure platforms, as well as designing and implementing infrastructure as code for private/public cloud. You will be responsible for maintaining existing and new configuration management code and...


  • Palo Alto, United States Anvilogic Full time

    We're looking for a Staff Software Engineer to join the platform team at Anvilogic. We are a successful, venture funded startup with the mission of re-imagining the Security Operation Center (SOC) enabling network defenders to rapidly respond to escalating cyber attacks. We have a passionate and rapidly growing customer base, and we are looking for staff...


  • Palo Alto, United States Assured Full time

    Job DescriptionJob DescriptionAssured is on a mission to modernize insurance. Claims processing (i.e. should we pay this claim?), while often overlooked, is the foundation of the entire industry. It’s currently highly manual, involving phone calls, faxes, and gut instinct—costing tens of billions of dollars a year. We can do better.At Assured, we provide...


  • Palo Alto, United States Assured Full time

    Job DescriptionJob DescriptionAssured is on a mission to modernize insurance. Claims processing (i.e. should we pay this claim?), while often overlooked, is the foundation of the entire industry. It’s currently highly manual, involving phone calls, faxes, and gut instinct—costing tens of billions of dollars a year. We can do better.At Assured, we provide...


  • Palo Alto, United States Assured Full time

    Job DescriptionJob DescriptionAssured is on a mission to modernize insurance. Claims processing (i.e. should we pay this claim?), while often overlooked, is the foundation of the entire industry. It’s currently highly manual, involving phone calls, faxes, and gut instinct—costing tens of billions of dollars a year. We can do better.At Assured, we provide...


  • Palo Alto, United States Assured Full time

    Job DescriptionJob DescriptionAssured is on a mission to modernize insurance. Claims processing (i.e. should we pay this claim?), while often overlooked, is the foundation of the entire industry. It’s currently highly manual, involving phone calls, faxes, and gut instinct—costing tens of billions of dollars a year. We can do better.At Assured, we provide...


  • Palo Alto, United States Certitudeservices Full time

    Connecting Experienced Professionalswith the Right Clients Company Description Certitude Services Sarl is a management consultancy firm providing advisory services to stakeholders within the financial and information technology sectors in Luxembourg with a team of experienced consultants. Role Description Certitude Services Sarl is seeking a full time Cloud...