Senior HPC Engineer

5 months ago


Mountain View, United States ASRC Federal Holding Company Full time

Job Title

Senior HPC Engineer

Location

NASA/AMES, MOFFETT FIELD-CA026

Job Description

ASRC Federal is searching for a Senior HPC Engineer to support Inuteq LLC which this role is fully telework

ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous, standards-driven process improvement, and assimilation of industry best practices. We are seeking to fill a role that primarily provides development for Supercomputing Batch Scheduling with Supercomputing Systems Administration secondary support for our NASA NACS High Performance Computing (HPC) contract.

Summary: The successful candidate will be an active supporting member of the ASRC Federal team reporting directly to the Manager of the Application Performance and Productivity (APP) group and matrixed directly to the Supercomputing Systems Team Manager.

An individual at this skill level should have demonstrated extensive experience working with common HPC batch schedulers e.g. (PBS, Slurm, or Moab/Torque) while contributing to the support of users of HPC resources on the various issues they might have getting applications to run efficiently. This individual should demonstrate experience installing, maintaining, and upgrading HPC systems. The individual, along with the entire HPC team, will be engaged in the day-to-day operations and support of the HPC resources. Activities may include system patching, OS upgrades, deploying new systems, writing scripts, and troubleshooting system issues on the HPC system. The ability to interact with users to determine symptoms, and then reproduce their issues to isolate the causes is critical skills for this work. There will also be activities in testing, benchmarking, user tool scripting, and analyzing trouble tickets to find patterns indicating system or user education issues.

Duties and Responsibilities:

Designs, deploys and maintains HPC clusters with over 2000+ nodes with InfiniBand, 100+ petabytes of data storage in production.Write and shepherd scalable feature designs through the entire software development process, from requirements and use cases to releaseDesigns and develops scripts for system administration, monitoring and usage reporting.Modify existing software to correct errors and/or improve performanceDesigns and develops scripts for system regression test and performance (file systems (Luster), scheduler (PBS), interconnect (HDR/NDR, Slingshot, ), high availability, etc.).Troubleshoots, isolates and resolves application, system and other technical problems (hardware, software, and network).Understands research use cases, researches and deploys new technologies, defining cost, performance and other trade-offs.Manages and maintains tools for configuration management (HPCM, Ansible & GIT), resource management, scheduling and all necessary aspects of HPC in accordance with best practices.Researches, deploys and manages networking and security infrastructure, including development of policies and procedures.Assists in developing and writing proposals and publications.Creates and provides clear documentation.
Mentoring junior staff and cross training peersAfter hours/weekend support as requiredModerate Supercomputing System Administration that contributes to: Day-to-day operations of the Linux HPC clusters and storage systemsProactive monitoring, analyze, and correct system issuesDevelopment of scripts to automate repetitive tasks or tools to enhance support of the HPC systemsSystem performance analysis and tuningBuilding, installing, and supporting user-requested softwareSupporting evaluation and assessment of new HPC technologyResolving user report issues and manage support tickets requests in Remedy

Requirements

Requirements:

Bachelor’s degree in computer science or related fieldStrong computer science background with in-depth systems-level knowledge in operating systems and networkingA minimum of 10 years experience of administration of HPC systems and scheduling software (PBS, Slurm, or Moab/Torque)A minimum of 10 years of experience of systems programming in heterogeneous, multi-platform HPC environmentsStrong ability to analyze, debug and maintain the integrity of an existing code baseDemonstrated equivalence of 5 years of Linux/UNIX user support experience and hands-on experience with administration of Linux systemsExperience working with HPC applications and proficiency in at least C, C++, or FortranSuperior scripting skills and excellent attention to detail; proficiency in at least Python, Perl, or BashStrong ability to interact with customers to understand needs, elicit requirements, and get feedback on prototype solutionsExcellent communication and people skills; excellent time management and organizational skillsExperience with system configuration management tools e.g. , puppet, chef, ansibleExperience with revision control software e.g. CVS, SVN, GitTrack record of delivering commercial quality software on schedule with excellent quality through multiple release cyclesProficiency at technical writing

Preferred Skills (Requesting Manager Defines):

Proficiency with analysis and problem-solving skills for debugging and optimization of applications Familiarity/proficiency with OpenMP and Message Passing Interface (MPI) programmingExperience with Lustre, and InfiniBandExperience with cloud technologies (AWS, Azure, GCP), OpenStack or Kubernetes is a plus

  • Mountain View, California, United States ASRC Federal Holding Company Full time

    Job Title: Senior HPC EngineerJob Summary:ASRC Federal Holding Company is seeking a highly skilled Senior HPC Engineer to support Inuteq LLC. This role is fully telework.The successful candidate will be an active supporting member of the ASRC Federal team, reporting directly to the Manager of the Application Performance and Productivity (APP) group and...

  • HPC Systems Engineer

    1 month ago


    Mountain View, California, United States ASRC Federal Holding Company Full time

    Job TitleStaff HPC EngineerLocationNASA/AMES, MOFFETT FIELD-CA026Job DescriptionASRC Federal is seeking a Staff HPC Engineer to support Inuteq LLC out of NASA AMES, CA.Our company provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers....


  • Mountain View, California, United States ASRC Federal Holding Company Full time

    Job DescriptionASRC Federal Holding Company is seeking a Senior HPC Applications Manager to support Inuteq LLC out of NASA AMES, CA. The successful candidate will directly oversee four HPC related teams, known as subtasks, in the following areas:HPC Application Services and ToolsHPC Cloud ComputingData Science Applications supporting HPC UsersHPC...

  • Staff HPC Engineer

    5 months ago


    Mountain View, United States ASRC Federal Holding Company Full time

    Job TitleStaff HPC EngineerLocationNASA/AMES, MOFFETT FIELD-CA026Job DescriptionASRC Federal is searching for a Staff HPC Engineer to support Inuteq LLC out of NASA AMES, CA ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal...


  • Mountain View, California, United States ASRC Federal Holding Company Full time

    Job TitleSenior HPC Systems AdministratorLocationASRC Federal Holding CompanyJob DescriptionASRC Federal Holding Company is seeking a highly skilled Senior HPC Systems Administrator to support our High Performance Computing (HPC) services. The successful candidate will be responsible for designing, deploying, and maintaining HPC clusters with over 2000+...


  • Mountain View, California, United States Groq Full time

    We are seeking a highly skilled Senior Systems Software Engineer to join our team at Groq. As a key member of our multi-disciplinary team, you will play a crucial role in the development, integration, and testing of machine learning HPC platforms.Key Responsibilities:Work within a multi-disciplinary team environment to develop, integrate, and test machine...


  • Mountain View, California, United States Enfabrica Full time

    Technical ExpertiseAs a Principal Customer Engineer at Enfabrica, you will be responsible for delivering technical solutions to our customers. This role requires a deep understanding of data center and AI/ML/HPC networking technologies, as well as experience in bring up, troubleshooting, and performance tuning of large-scale DC/HPC/AI/ML cluster...


  • Mountain View, California, United States Enfabrica Full time

    Job OverviewEnfabrica is seeking a highly skilled Principal Customer Engineer to join our team. As a key member of our customer-facing team, you will be responsible for providing technical support and guidance to our customers, ensuring their success with our products and solutions.Key ResponsibilitiesProvide technical pre-sales support to customers,...


  • Mountain View, California, United States Enfabrica Full time

    Technical Customer Interaction and SupportWe are seeking a highly skilled Principal Customer Engineer to join our team at Enfabrica. As a key member of our customer-facing team, you will be responsible for providing technical support and guidance to our customers, ensuring their success with our products and solutions.Key ResponsibilitiesProvide technical...


  • Mountain View, California, United States Enfabrica Full time

    Principal Customer EngineerWe are seeking a highly skilled Principal Customer Engineer to join our team at Enfabrica. As a key member of our technical team, you will be responsible for delivering exceptional customer experiences and driving technical success for our clients.Key Responsibilities:Present Enfabrica products and solutions to customers and...


  • Mountain View, California, United States Microsoft Corporation Full time

    Job Title: Senior Hardware EngineerMicrosoft Corporation is seeking a highly skilled Senior Hardware Engineer to join our team. As a Senior Hardware Engineer, you will be responsible for designing and developing innovative hardware solutions for our cloud infrastructure.Responsibilities:Lead the design of hardware components and systemsCollaborate with...


  • Mountain View, Arkansas, United States Codeium Full time

    About CodeiumWe're a leader in the AI developer tools space, featured on the Forbes AI 50 list. Our mission is to build AI superpowers for developers. We serve one of the largest scale and most demanding LLM applications in the world.Job DescriptionWe're looking for a high performance ML-focused software engineer to join our team. As a key member of our...


  • Mountain View, California, United States Proclaim by Fresh Health, Inc. Full time

    Job Title: Senior Mechanical EngineerWe are seeking a highly skilled Senior Mechanical Engineer to join our team at Proclaim by Fresh Health, Inc. As a Senior Mechanical Engineer, you will be responsible for designing, developing, and testing medical devices that meet the highest standards of quality and safety.Key Responsibilities:Design and develop medical...


  • Mountain View, California, United States The Judge Group Full time

    Job Title: Senior Mechanical EngineerJob Summary:At The Judge Group, we are seeking a highly skilled Senior Mechanical Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, developing, and implementing mechanical systems and products.Key Responsibilities: Design and develop mechanical systems and products...

  • Senior Cloud Engineer

    2 weeks ago


    Mountain View, California, United States Verily Full time

    Job SummaryWe are seeking a highly skilled Senior Cloud Engineer to join our team at Verily. As a Senior Cloud Engineer, you will play a key role in developing and maintaining our cloud platform capabilities and tools. You will work closely with software engineers, hardware engineers, and data scientists to help them learn, adapt, and adopt our core cloud...


  • Mountain View, California, United States Central Business Solutions Full time

    Job Title: Sr. Automation EngineerJob Summary: Central Business Solutions is seeking a highly skilled Senior Test Automation Engineer to join our team. As a key member of our quality engineering team, you will be responsible for developing and maintaining automated test cases to ensure the highest quality software releases.Responsibilities: Develop and...


  • Mountain View, California, United States Central Business Solutions Full time

    Job Title: Senior Automation EngineerJob Summary: Central Business Solutions is seeking a highly skilled Senior Automation Engineer to join our team. As a Senior Automation Engineer, you will be responsible for the testing of software solutions developed by the Company. You will leverage your experience in multi-platform environment testing to develop and...


  • Mountain View, California, United States Intuit Full time

    Job SummaryWe are seeking an experienced Senior Engineering Manager to lead, mentor, and grow a scrum team of experienced software engineers working on our Marketing Technology Platform.This role will play a pivotal part in driving the technical direction, fostering a culture of innovation and collaboration, and implementing scalable solutions that integrate...


  • Mountain View, Arkansas, United States Codeium Full time

    About CodeiumWe're a leader in the AI developer tools space, featured on the Forbes AI 50 list. Our mission is to build AI superpowers for developers. We serve one of the largest scale and most demanding LLM applications in the world.Job SummaryWe're seeking a high performance ML-focused software engineer to join our team. As a key member of our engineering...


  • Mountain View, California, United States Microsoft Corporation Full time

    Job Title: Senior Thermal EngineerWe are seeking a highly skilled Senior Thermal Engineer to join our Mixed Reality Team. As a key member of our team, you will be responsible for designing thermal solutions for Mixed Reality devices.Key Responsibilities:Perform CFD simulation and calculations to accurately predict thermal behavior of electronic...