HPC Engineer

4 weeks ago


Atlanta, United States Brooksource Full time

HPC Engineer (HPC and AWS Environment)

100% Remote (9AM-5PM EST Work Hours)

Direct Hire (Full-Time Employment)


We are hiring a High-Performance Computing (HPC) Engineer with experience working in a hybrid on-premises HPC and AWS cloud environment. As an HPC Engineer, you will join an innovative HPC team responsible for configuring, integrating, and managing HPC clusters on AWS cloud for our prestigious client, a private research university based in Atlanta, GA.


You will play a pivotal role in supporting their hybrid on-prem HPC infrastructure and AWS cloud-based HPC, while continually expanding and integrating HPC clusters with AWS services to meet the growing scientific computing needs of its researchers, allowing researchers to perform computationally intensive workloads more quickly and securely, particularly in the multi-disciplinary field of Artificial Intelligence (AI).


Key Responsibilities:

  • Design, implement, and maintain high-performance computing (HPC) infrastructure on both AWS cloud and on-premises platforms.
  • Manage HPC clusters on AWS cloud using AWS ParallelCluster and all related AWS services including Amazon EC2, AWS CloudFormation, Amazon FSx, and Amazon EFS.
  • Implement and optimize the use of Slurm, cluster management software, for efficient HPC job scheduling and management.
  • Collaborate with researchers and faculty to understand their scientific computing and machine learning (ML) needs and provide tailored solutions.
  • Actively seek to understand the latest AI research computing requirements and plan infrastructure upgrades to keep up with evolving trends.
  • Provide training, assistance in scripting, software installation services, and technical troubleshooting services to end-users.
  • Document use cases, reusable patterns, and technical guidelines.
  • Ensure quality outcomes through best practices in security, infrastructure as code, streamlined releases processes, and thorough testing and validation.


Minimum Requirements:

  • 3+ years of experience in Linux administration.
  • 2+ years as an HPC Engineer with HPC cluster user support and troubleshooting.
  • 1+ year of AWS cloud infrastructure experience with AWS services used for managing HPC clusters including AWS ParallelCluster, EC2, CloudFormation, FSx, and EFS.
  • Experience with Slurm cluster management software.
  • Scripting experience with Python or Bash, as well as related tools such as Ansible and Git.
  • Knowledge of scientific computing and machine learning.


Preferred Qualifications:

  • Experience working with researchers within an academic, research, or scientific institution.
  • Experience with specialized computing including GPU utilization, parallelization, and DevOps aspects such as containerization and automation.
  • Knowledge of scientific data, bioinformatics packages, big data analysis methods, and machine learning algorithms.
  • AWS Certified Solutions Architect certification.

  • HPC Engineer

    4 weeks ago


    Atlanta, United States Brooksource Full time

    HPC Engineer (HPC and AWS Environment)100% Remote (9AM-5PM EST Work Hours)Direct Hire (Full-Time Employment)We are hiring a High-Performance Computing (HPC) Engineer with experience working in a hybrid on-premises HPC and AWS cloud environment. As an HPC Engineer, you will join an innovative HPC team responsible for configuring, integrating, and managing HPC...