Staff HPC Engineer
5 months ago
Job Title
Staff HPC EngineerLocation
NASA/AMES, MOFFETT FIELD-CA026Job Description
ASRC Federal is searching for a Staff HPC Engineer to support Inuteq LLC out of NASA AMES, CA
ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous, standards-driven process improvement, and assimilation of industry best practices. We are seeking to fill a role that primarily provides development for Supercomputing Batch Scheduling with Supercomputing Systems Administration secondary support for our NASA NACS High Performance Computing (HPC) contract.
Summary: The successful candidate will be an active supporting member of the ASRC Federal team reporting directly to the Manager of the Application Performance and Productivity (APP) group and matrixed directly to the Supercomputing Systems Team Manager.
An individual at this skill level should have demonstrated extensive experience working with common HPC batch schedulers e.g. (PBS, Slurm, or Moab/Torque) while contributing to the support of users of HPC resources on the various issues they might have getting applications to run efficiently. This individual should demonstrate experience installing, maintaining, and upgrading HPC systems. The individual, along with the entire HPC team, will be engaged in the day-to-day operations and support of the HPC resources. Activities may include system patching, OS upgrades, deploying new systems, writing scripts, and troubleshooting system issues on the HPC system. The ability to interact with users to determine symptoms, and then reproduce their issues to isolate the causes is critical skills for this work. There will also be activities in testing, benchmarking, user tool scripting, and analyzing trouble tickets to find patterns indicating system or user education issues.
Duties and Responsibilities:
Designs, deploys and maintains HPC clusters with over 2000+ nodes with InfiniBand, 100+ petabytes of data storage in production.Write and shepherd scalable feature designs through the entire software development process, from requirements and use cases to releaseDesigns and develops scripts for system administration, monitoring and usage reporting.Modify existing software to correct errors and/or improve performanceDesigns and develops scripts for system regression test and performance (file systems (Luster), scheduler (PBS), interconnect (HDR/NDR, Slingshot, ), high availability, etc.).Troubleshoots, isolates and resolves application, system and other technical problems (hardware, software, and network).Understands research use cases, researches and deploys new technologies, defining cost, performance and other trade-offs.Manages and maintains tools for configuration management (HPCM, Ansible & GIT), resource management, scheduling and all necessary aspects of HPC in accordance with best practices.Researches, deploys and manages networking and security infrastructure, including development of policies and procedures.Assists in developing and writing proposals and publications.Creates and provides clear documentation.Mentoring junior staff and cross training peersAfter hours/weekend support as requiredModerate Supercomputing System Administration that contributes to: Day-to-day operations of the Linux HPC clusters and storage systemsProactive monitoring, analyze, and correct system issuesDevelopment of scripts to automate repetitive tasks or tools to enhance support of the HPC systemsSystem performance analysis and tuningBuilding, installing, and supporting user-requested softwareSupporting evaluation and assessment of new HPC technologyResolving user report issues and manage support tickets requests in Remedy
Requirements
Requirements:
Bachelor’s degree in computer science or related fieldStrong computer science background with in-depth systems-level knowledge in operating systems and networkingA minimum of 5 years experience of administration of HPC systems and scheduling software (PBS, Slurm, or Moab/Torque)A minimum of 5 years of experience of systems programming in heterogeneous, multi-platform HPC environmentsStrong ability to analyze, debug and maintain the integrity of an existing code baseDemonstrated equivalence of 5 years of Linux/UNIX user support experience and hands-on experience with administration of Linux systemsExperience working with HPC applications and proficiency in at least C, C++, or FortranSuperior scripting skills and excellent attention to detail; proficiency in at least Python, Perl, or BashStrong ability to interact with customers to understand needs, elicit requirements, and get feedback on prototype solutionsExcellent communication and people skills; excellent time management and organizational skillsExperience with system configuration management tools e.g. , puppet, chef, ansibleExperience with revision control software e.g. CVS, SVN, GitTrack record of delivering commercial quality software on schedule with excellent quality through multiple release cyclesProficiency at technical writingPreferred Skills (Requesting Manager Defines):
Proficiency with analysis and problem-solving skills for debugging and optimization of applications Familiarity/proficiency with OpenMP and Message Passing Interface (MPI) programmingExperience with Lustre, and InfiniBandExperience with cloud technologies (AWS, Azure, GCP), OpenStack or Kubernetes is a plus-
HPC Systems Engineer
1 month ago
Mountain View, California, United States ASRC Federal Holding Company Full timeJob TitleStaff HPC EngineerLocationNASA/AMES, MOFFETT FIELD-CA026Job DescriptionASRC Federal is seeking a Staff HPC Engineer to support Inuteq LLC out of NASA AMES, CA.Our company provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers....
-
Senior HPC Engineer
5 months ago
Mountain View, United States ASRC Federal Holding Company Full timeJob TitleSenior HPC EngineerLocationNASA/AMES, MOFFETT FIELD-CA026Job DescriptionASRC Federal is searching for a Senior HPC Engineer to support Inuteq LLC which this role is fully telework ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to...
-
Senior HPC Systems Administrator
1 month ago
Mountain View, California, United States ASRC Federal Holding Company Full timeJob Title: Senior HPC EngineerJob Summary:ASRC Federal Holding Company is seeking a highly skilled Senior HPC Engineer to support Inuteq LLC. This role is fully telework.The successful candidate will be an active supporting member of the ASRC Federal team, reporting directly to the Manager of the Application Performance and Productivity (APP) group and...
-
Senior HPC Applications Manager
2 weeks ago
Mountain View, California, United States ASRC Federal Holding Company Full timeJob DescriptionASRC Federal Holding Company is seeking a Senior HPC Applications Manager to support Inuteq LLC out of NASA AMES, CA. The successful candidate will directly oversee four HPC related teams, known as subtasks, in the following areas:HPC Application Services and ToolsHPC Cloud ComputingData Science Applications supporting HPC UsersHPC...
-
Principal Customer Engineer
2 weeks ago
Mountain View, California, United States Enfabrica Full timeJob OverviewEnfabrica is seeking a highly skilled Principal Customer Engineer to join our team. As a key member of our customer-facing team, you will be responsible for providing technical support and guidance to our customers, ensuring their success with our products and solutions.Key ResponsibilitiesProvide technical pre-sales support to customers,...
-
Principal Customer Engineer
2 weeks ago
Mountain View, California, United States Enfabrica Full timeTechnical Customer Interaction and SupportWe are seeking a highly skilled Principal Customer Engineer to join our team at Enfabrica. As a key member of our customer-facing team, you will be responsible for providing technical support and guidance to our customers, ensuring their success with our products and solutions.Key ResponsibilitiesProvide technical...
-
Principal Customer Engineer
2 weeks ago
Mountain View, California, United States Enfabrica Full timePrincipal Customer EngineerWe are seeking a highly skilled Principal Customer Engineer to join our team at Enfabrica. As a key member of our technical team, you will be responsible for delivering exceptional customer experiences and driving technical success for our clients.Key Responsibilities:Present Enfabrica products and solutions to customers and...
-
Senior Systems Software Engineer
6 days ago
Mountain View, California, United States Groq Full timeWe are seeking a highly skilled Senior Systems Software Engineer to join our team at Groq. As a key member of our multi-disciplinary team, you will play a crucial role in the development, integration, and testing of machine learning HPC platforms.Key Responsibilities:Work within a multi-disciplinary team environment to develop, integrate, and test machine...
-
Senior Network Architect
7 days ago
Mountain View, California, United States Enfabrica Full timeTechnical ExpertiseAs a Principal Customer Engineer at Enfabrica, you will be responsible for delivering technical solutions to our customers. This role requires a deep understanding of data center and AI/ML/HPC networking technologies, as well as experience in bring up, troubleshooting, and performance tuning of large-scale DC/HPC/AI/ML cluster...
-
High Performance ML Software Engineer
1 month ago
Mountain View, Arkansas, United States Codeium Full timeAbout CodeiumWe're a leader in the AI developer tools space, featured on the Forbes AI 50 list. Our mission is to build AI superpowers for developers. We serve one of the largest scale and most demanding LLM applications in the world.Job DescriptionWe're looking for a high performance ML-focused software engineer to join our team. As a key member of our...
-
Mountain View, Arkansas, United States Codeium Full timeAbout CodeiumWe're a leader in the AI developer tools space, featured on the Forbes AI 50 list. Our mission is to build AI superpowers for developers. We serve one of the largest scale and most demanding LLM applications in the world.Job SummaryWe're seeking a high performance ML-focused software engineer to join our team. As a key member of our engineering...
-
Staff Optical Engineer
2 weeks ago
Mountain View, California, United States Aurora Innovation Full timeJob Title: Staff Optical EngineerAurora Innovation is seeking a highly skilled Staff Optical Engineer to join our team. As a Staff Optical Engineer, you will be responsible for performing engineering analysis and testing of free space optics for lidar, detailed tolerance, manufacturability, and cost. You will also be presenting, documenting, and educating...
-
Staff Optical Engineer
2 weeks ago
Mountain View, California, United States Aurora Innovation Full timeJob Title: Staff Optical EngineerAurora Innovation is seeking a highly skilled Staff Optical Engineer to join our team. As a Staff Optical Engineer, you will be responsible for designing and developing optical systems for our lidar technology.Key Responsibilities:Perform engineering analysis and testing of free space optics for lidar, including detailed...
-
Staff Optical Engineer
2 weeks ago
Mountain View, California, United States Aurora Innovation Full timeJob Title: Staff Optical EngineerWe are seeking a highly skilled Staff Optical Engineer to join our team at Aurora Innovation. As a key member of our optical engineering team, you will be responsible for designing and developing cutting-edge optical systems for our self-driving technology.Key Responsibilities:Perform engineering analysis and testing of free...
-
Staff Cloud Engineer
2 weeks ago
Mountain View, California, United States Verily Full timeJob DescriptionAt Verily, we are seeking a highly skilled Staff Cloud Engineer to join our team. As a key member of our engineering organization, you will play a critical role in developing and maintaining our cloud platform capabilities.As a Staff Cloud Engineer, you will be responsible for:Developing and implementing cloud management tools and best...
-
Staff Compiler Engineer
1 month ago
Mountain View, Arkansas, United States Waymo Full timeJob Title: Staff Machine Learning Compiler Engineer, ComputeWaymo is a leading autonomous driving technology company with a mission to create the most trusted driver. Our team is responsible for delivering the compute platform that powers our autonomous vehicles, and we're seeking a highly skilled Staff Machine Learning Compiler Engineer to join our Compute...
-
Staff Optical Engineer
2 weeks ago
Mountain View, California, United States Aurora Innovation Full timeJob Title: Staff Optical EngineerAurora Innovation is seeking a highly skilled Staff Optical Engineer to join our team. As a key member of our engineering team, you will be responsible for designing and developing advanced optical systems for our self-driving technology.Key Responsibilities:Perform engineering analysis and testing of free space optics for...
-
Staff Systems Engineer
1 month ago
Mountain View, California, United States Nuro Full timeAbout NuroNuro is a leading autonomous technology company that exists to better everyday life through robotics. Founded in 2016, we have vehicles on road today in California and Texas.The RoleWe are seeking a highly skilled Staff Systems Engineer to join our team. As a key member of our autonomy software team, you will work closely with us to strategically...
-
Staff Modeling/Simulation Engineer
2 weeks ago
Mountain View, California, United States Aurora Innovation Full timeJoin Aurora Innovation as a Staff Modeling/Simulation EngineerAurora Innovation is a leader in self-driving technology, and we're seeking a talented Staff Modeling/Simulation Engineer to join our team. As a key member of our engineering team, you will drive the development of physics-based and algorithmic simulations of lidar behavior and performance across...
-
High Performance ML Software Engineer
2 months ago
Mountain View, Arkansas, United States Codeium Full timeAbout CodeiumWe're a leading AI developer tools company, featured on the Forbes AI 50 list, with a mission to build AI superpowers for developers. Our state-of-the-art proprietary language models and custom inference stack enable us to deliver exceptional user experiences. We've achieved significant revenue and enterprise traction, solidifying our position...