Senior HPC Engineer

3 weeks ago


Mountain View, United States ASRC Federal Holding Company Full time
Job Description

ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous, standards-driven process improvement, and assimilation of industry best practices. We are seeking to fill a role that primarily provides development for Supercomputing Batch Scheduling with Supercomputing Systems Administration secondary support for our NASA NACS High Performance Computing (HPC) contract.

Summary: The successful candidate will be an active supporting member of the ASRC Federal team reporting directly to the Manager of the Application Performance and Productivity (APP) group and matrixed directly to the Supercomputing Systems Team Manager.

An individual at this skill level should have demonstrated extensive experience working with common HPC batch schedulers e.g. (PBS, Slurm, or Moab/Torque) while contributing to the support of users of HPC resources on the various issues they might have getting applications to run efficiently. This individual should demonstrate experience installing, maintaining, and upgrading HPC systems. The individual, along with the entire HPC team, will be engaged in the day-to-day operations and support of the HPC resources. Activities may include system patching, OS upgrades, deploying new systems, writing scripts, and troubleshooting system issues on the HPC system. The ability to interact with users to determine symptoms, and then reproduce their issues to isolate the causes is critical skills for this work. There will also be activities in testing, benchmarking, user tool scripting, and analyzing trouble tickets to find patterns indicating system or user education issues.

Duties and Responsibilities:

  • Designs, deploys and maintains HPC clusters with over 2000+ nodes with InfiniBand, 100+ petabytes of data storage in production.
  • Write and shepherd scalable feature designs through the entire software development process, from requirements and use cases to release
  • Designs and develops scripts for system administration, monitoring and usage reporting.
  • Modify existing software to correct errors and/or improve performance
  • Designs and develops scripts for system regression test and performance (file systems (Luster), scheduler (PBS), interconnect (HDR/NDR, Slingshot, ), high availability, etc.).
  • Troubleshoots, isolates and resolves application, system and other technical problems (hardware, software, and network).
  • Understands research use cases, researches and deploys new technologies, defining cost, performance and other trade-offs.
  • Manages and maintains tools for configuration management (HPCM, Ansible GIT), resource management, scheduling and all necessary aspects of HPC in accordance with best practices.
  • Researches, deploys and manages networking and security infrastructure, including development of policies and procedures.
  • Assists in developing and writing proposals and publications.
  • Creates and provides clear documentation.
  • Mentoring junior staff and cross training peers
  • After hours/weekend support as required
  • Moderate Supercomputing System Administration that contributes to:
    • Day-to-day operations of the Linux HPC clusters and storage systems
    • Proactive monitoring, analyze, and correct system issues
    • Development of scripts to automate repetitive tasks or tools to enhance support of the HPC systems
    • System performance analysis and tuning
    • Building, installing, and supporting user-requested software
    • Supporting evaluation and assessment of new HPC technology
    • Resolving user report issues and manage support tickets requests in Remedy
Requirements

Requirements:
  • Bachelor’s degree in computer science or related field
  • Strong computer science background with in-depth systems-level knowledge in operating systems and networking
  • A minimum of 10 years experience of administration of HPC systems and scheduling software (PBS, Slurm, or Moab/Torque)
  • A minimum of 10 years of experience of systems programming in heterogeneous, multi-platform HPC environments
  • Strong ability to analyze, debug and maintain the integrity of an existing code base
  • Demonstrated equivalence of 5 years of Linux/UNIX user support experience and hands-on experience with administration of Linux systems
  • Experience working with HPC applications and proficiency in at least C, C++, or Fortran
  • Superior scripting skills and excellent attention to detail; proficiency in at least Python, Perl, or Bash
  • Strong ability to interact with customers to understand needs, elicit requirements, and get feedback on prototype solutions
  • Excellent communication and people skills; excellent time management and organizational skills
  • Experience with system configuration management tools e.g. , puppet, chef, ansible
  • Experience with revision control software e.g. CVS, SVN, Git
  • Track record of delivering commercial quality software on schedule with excellent quality through multiple release cycles
  • Proficiency at technical writing
Preferred Skills (Requesting Manager Defines):
  • Proficiency with analysis and problem-solving skills for debugging and optimization of applications
  • Familiarity/proficiency with OpenMP and Message Passing Interface (MPI) programming
  • Experience with Lustre, and InfiniBand
  • Experience with cloud technologies (AWS, Azure, GCP), OpenStack or Kubernetes is a plus


EEO Statement

ASRC Federal and its Subsidiaries are Equal Opportunity / Affirmative Action employers. All qualified applicants will receive consideration for employment without regard to race, gender, color, age, sexual orientation, gender identification, national origin, religion, marital status, ancestry, citizenship, disability, protected veteran status, or any other factor prohibited by applicable law.
  • Senior HPC Engineer

    1 week ago


    Mountain View, United States ASRC Federal Holding Company Full time

    Job TitleSenior HPC EngineerLocationNASA/AMES, MOFFETT FIELD-CA026Job DescriptionASRC Federal is searching for a Senior HPC Engineer to support Inuteq LLC which this role is fully telework ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to...

  • Senior HPC Engineer

    1 month ago


    Mountain View, United States ASRC Federal Holding Company Full time

    Job TitleSenior HPC EngineerLocationNASA/AMES, MOFFETT FIELD-CA026Job DescriptionASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous,...

  • Senior HPC Engineer

    6 days ago


    Mountain View, United States ASRC Federal Holding Company Full time

    Job DescriptionASRC Federal is searching for a Senior HPC Engineer to support Inuteq LLC which this role is fully telework ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and...

  • Senior HPC Engineer

    7 days ago


    Mountain View, United States ASRC Federal Holding Company Full time

    Job Description ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous, standards-driven process improvement, and assimilation of...


  • Mountain View, United States ASRC Federal Full time

    Job Description ASRC Federal, InuTeq proudly supports NASA's High Performance Computing Services program with our site in Mountain View, CA at the Ames Research Center. Make a DIFFERENCE on a program that supports 4 On-site Supercomputers totaling 18,000 nodes and 17+ combined petaflops. Our program provides High Performance Computing services...

  • Staff HPC Engineer

    1 week ago


    Mountain View, United States ASRC Federal Holding Company Full time

    Job TitleStaff HPC EngineerLocationNASA/AMES, MOFFETT FIELD-CA026Job DescriptionASRC Federal is searching for a Staff HPC Engineer to support Inuteq LLC out of NASA AMES, CA ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal...


  • Mountain View, United States Varada Consulting Full time

    Senior HPC and Infrastructure AdministratorClearance: Ability to obtain a Public TrustJob Location: Hybrid-onsite (Mountain View, CA) Overview:Varada Consulting proudly supports NASA's High Performance Computing Services program in Mountain View, CA at the Ames Research Center and in Greenbelt, MD at Goddard Space Flight Center. Make a DIFFERENCE on a...

  • Staff HPC Engineer

    1 month ago


    Mountain View, United States ASRC Federal Holding Company Full time

    Job TitleStaff HPC EngineerLocationNASA/AMES, MOFFETT FIELD-CA026Job DescriptionASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous,...

  • Staff HPC Engineer

    1 week ago


    Mountain View, United States ASRC Federal Holding Company Full time

    Job Description ASRC Federal is searching for a Staff HPC Engineer to support Inuteq LLC out of NASA AMES, CA ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are...

  • Staff HPC Engineer

    7 days ago


    Mountain View, United States ASRC Federal Holding Company Full time

    Job Description ASRC Federal is searching for a Staff HPC Engineer to support Inuteq LLC out of NASA AMES, CA ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are...

  • staff hpc engineer

    2 days ago


    Mountain View, United States Randstad Full time

    staff hpc engineer. mountain view , california posted 1 day ago job details summary $60 - $70 per hour temp to perm bachelor degree category computer and mathematical occupations reference1053918 job details job summary: Randstad Federal is seeking a Staff HPC Engineer for a role supporting NASA location: Mountain View, California job type:...

  • Staff HPC Engineer

    4 weeks ago


    Mountain View, United States ASRC Federal Holding Company Full time

    Job Description ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous, standards-driven process improvement, and assimilation of...


  • Mountain View, United States Diverse Lynx Full time

    Senior Hardware Engineer Mountain View, CA (Day 1 onsite) Fulltime Position Automotive client domain experience is mandatory Job Description: HPC hardware development experience in embedded environment Bachelor's degree in engineering Knowledge of HPC HW architecture and hardware development processes. Knowledge of Hardware platform development and...


  • Mountain View, United States Diverse Lynx Full time

    Senior Hardware Engineer Mountain View, CA (Day 1 onsite) Fulltime Position Automotive client domain experience is mandatory Job Description: HPC hardware development experience in embedded environment Bachelor's degree in engineering Knowledge of HPC HW architecture and hardware development processes. Knowledge of Hardware platform...

  • Electrical Engineer

    5 days ago


    Mountain View, United States Akkodis Full time

    Akkodis is seeking a highly skilled Senior Hardware Design Engineer with expertise in automotive systems to join our dynamic team. The ideal candidate will have a strong background in designing compute PCBs, selecting and supporting hardware solutions, and ensuring signal integrity and power management for embedded systems. Responsibilities: Design compute...


  • Mountain View, California, United States Groq Full time

    At Groq. We believe in an AI economy powered by human agency. We envision a world where AI is accessible to all, a world that demands processing power that is better, faster, and more affordable than is available today. AI applications are currently constrained by the limitations of the Graphics Processing Unit (GPU), a technology originally developed for...

  • Hardware engineer

    1 week ago


    Mountain View, United States Capgemini Full time

    Title: Hardware Engineer Location: Mountain View CA Duration: Full Time Position Description: • Design of compute PCBs for automotive systems. • Work with component suppliers to select and support hardware solutions. • Understanding the basics of SI/PI for board design • A min of 7+ years of HPC hardware development experience in embedded systems •...

  • Hardware Engineer

    19 hours ago


    Mountain View, United States Cygnus Professionals Inc. Full time

    Role: Hardware Engineer Location: Mountain View, CA – Onsite/Hybrid (3 Days Onsite, 2 Days remote- in a Week). WHAT We are looking for here.Develop Electronics board, controller model development experience Memory, High Speed Design Experience Requirements:· Design of compute PCBs for automotive systems.· Work with component suppliers to select and...


  • Mountain View, United States Phantom AI Full time

    About Us At Phantom AI, we've built a team of incredibly talented and ambitious people challenging the norm in the automotive industry. We are building cost-effective L2/L3 solutions to reduce the burden of everyday driving and make the roads safe for everyone. For instance, we believe democratizing technologies such as Automatic Emergency Braking and...


  • Mountain View, United States Kofi Group Full time

    To Apply for this Job Click HereSenior Cloud Infrastructure Engineer A well established healthcare start-up thats is based in the SF Bay area is seeking a senior Cloud Infrastructure engineerRequirements:10+ years of Linux System Administraionnext generation cloud platform utilizing public cloud, Kubernetes and containerizationExperience authoring...