Current jobs related to HPC Cluster Engineer - Westford, Massachusetts - Sustainable Talent


  • Westford, Massachusetts, United States NVIDIA Full time

    We are seeking a highly skilled Senior HPC Technical Support Engineer to join our team at NVIDIA. As a key member of our Technical Support team, you will be responsible for providing comprehensive solutions for sophisticated installations, maintenance, or operations for a broad scope of groundbreaking networking products.You will be the main point of contact...


  • Westford, Massachusetts, United States Nvidia Full time

    Job Title: Senior AI Developer Technology EngineerWe are seeking a highly skilled Senior AI Developer Technology Engineer to join our team at NVIDIA. As a key member of our Developer Technology Team, you will play a critical role in researching and developing techniques to GPU-accelerate high-performance workloads in the finance domain.Key...


  • Westford, Massachusetts, United States NVIDIA Full time

    NVIDIA is seeking a highly skilled and experienced professional to lead our scientific computing organization. As a lab manager, you will be responsible for building, managing, and maintaining innovative next-generation clusters in an R&D environment to create next-generation technology.You will work closely with research, engineering, and architecture teams...


  • Westford, Massachusetts, United States NVIDIA Full time

    About the RoleNVIDIA is a leading technology company that specializes in designing and manufacturing high-performance networking equipment for the world's most powerful supercomputers. Our innovative solutions enable the creation of powerful ML/AI platforms, and we're seeking a motivated and experienced Networking Solutions Architect to join our team.Key...


  • Westford, Massachusetts, United States NVIDIA Full time

    About NVIDIANVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. Our GPU serves as the visual cortex of modern computers and is at the heart of our products and services. We enable GPUs for large-scale deployments through high-speed networking solutions. Our work opens new universes to...


  • Westford, United States NVIDIA Full time

    NVIDIA networking designs and manufactures high-performance networking equipment that enable the most powerful super computers in the largest data centers in the world. With a distributed collection of NVIDIA GPUs inter-connected by networking solutions such as InfiniBand, Ethernet, or RoCE (RDMA over Converged Ethernet) we make powerful ML/AI platforms...

HPC Cluster Engineer

2 months ago


Westford, Massachusetts, United States Sustainable Talent Full time
Job Summary

We are seeking a highly skilled HPC Cluster Engineer to join our team at Sustainable Talent. As a key member of our organization, you will play a pivotal role in shaping the future of AI, deep learning, and machine learning initiatives.

Key Responsibilities
  • Lead the optimization of our Infiniband network and manage Lustre and GPFS storage solutions to ensure seamless performance for our cutting-edge initiatives.
  • Utilize your expertise in the SLURM job scheduler to orchestrate the smooth operation of our clusters, from scheduling tasks to managing resources efficiently.
  • Maintain the stability and security of our systems as a Linux sysadmin guru, leveraging your deep understanding of Linux environments.
  • Automate routine tasks and streamline operations using Ansible, freeing up time for innovation and optimization.
  • Develop dynamic solutions to complex challenges through advanced Python and bash scripting.
Requirements
  • Demonstrated experience with SLURM, coupled with a solid understanding of Infiniband networks and Lustre/GPFS storage systems, is essential.
  • A proven track record in Linux system administration, ensuring robustness and security in our computing environment.
  • Proficiency in Ansible is a must-have, enabling you to automate tasks and workflows efficiently.
  • Strong scripting abilities in Python and bash are critical for developing custom solutions and optimizing cluster performance.
What We Offer
  • A competitive salary based on factors like experience, education, location, etc.
  • Full benefits and PTO.
  • A dynamic and innovative work environment.

Sustainable Talent is an equal employment opportunity and affirmative action employer. We welcome applications from diverse candidates and are committed to creating an inclusive work environment.