HPC Engineer

1 week ago


Cold Spring Harbor, United States Crossfire Consulting Full time
HPC (High Performance Computing) Engineer

FT Position with our client in Cold Spring Harbor, NY

We are a forward-thinking technology organization seeking an experienced HPC Engineer to join our team. The ideal candidate will optimize and maintain our NVIDIA GPU-based high-performance computing infrastructure while collaborating with our technical teams to maximize computational efficiency.

Position Responsibilities

Cluster Implementation and Management:
  • Administration of the CSHL HPC cluster and storage system.
  • Optimizes, installs, and maintains the HPC software (EasyBuild, Anaconda).
  • Administration of HPC workload managers (Slurm, Grid Engine).
  • Collaborates with cross-functional teams to ensure seamless integration of hardware, software, and networking components.
  • Optimizes system performance, scalability, and reliability. Optimizes GPU performance and firmware to enhance the efficiency and scalability of decentralized AI inference tasks and general performance, processing and utilization.
  • Monitors cluster performance, identifies bottlenecks, and implements performance enhancements.
  • Adheres to best practice models to improve client services including ISO 20000 practices for service and application support, problem and incident management, server technology management, identity and access management, and management of continuous improvement.
  • Provides support and/or services for provisioning, installation/configuration, and maintenance of IT server systems hardware, software, and related infrastructure in alignment with organizational goals and requirements. Supports the CSHL community to adhere to standards for configurations.
  • Participates in new initiatives such as cluster expansion and storage usage efficiency.
  • Manages the full lifecycle of hardware development, from conception through deployment and maintenance.
User Support and/or Services:
  • Creates and updates end-user HPC documentation.
  • Works closely with scientists to optimize computational workloads, data movement, and parallel processing. Trains scientists on using the cluster effectively for AI workloads.
  • Optimizes, deploys, and maintains robust software to support high-performance AI/ML computations and parallel processing.
  • Collaborates with scientists and AI/ML engineers to tailor solutions that meet the specific needs of their research
  • Provides technical support, troubleshoots issues, and addresses user queries related to the cluster.
  • Assists in developing best practices for AI model training and deployment
Service Management:
  • As a key member of the IT Systems Engineering team, provides efficient, and effective resolution of incidents, and problems with a service-centric approach ensuring the stability and performance of CSHL services.
  • Documents systems configurations, processes, and procedures to ensure reproducible, stable systems that can be efficiently supported by CHSL IT teams. Works with other CSHL IT teams to assure knowledge transfer resulting in effective resolution of problems.
  • Contributes to the continual improvement of effective management of issues and incidents. Collaborates with other members of the Systems Engineering team to establish and monitor key performance indicators (KPIs) to measure systems and identify areas for improvement.
  • Maintains current knowledge of key technology trends, proactively preparing to assist the community with recommendations.
  • Communicates with and builds strong collaborative relationships with key stakeholders.
Vendor Management:
  • Coordinates with vendors and/or other CSHL teams to aid the procurement of necessary hardware, software, and services, ensuring cost-effective solutions that align with business needs.
Position Requirements

EDUCATION:
  • Bachelor's degree in information technology, computer science, or a related field (or equivalent combination of education and work experience).
EXPERIENCE:
  • 2+ years of experience in GPU computing, with a focus on performance optimization and parallel programming.
  • Proficiency in GPU programming languages such as CUDA.
  • Strong understanding of computer architecture, memory systems and parallel algorithms.
  • Experience with profiling and debugging tools for GPU applications desired, such as NVIDIA Nsignt.IT system administration and in IT server infrastructure operations.
  • IT Systems Engineering experience, including incident, problem, and request management processes.
  • Strong verbal and written communication skills, including ability to communicate, motivate, and collaborate effectively with diverse groups of people.
  • Ability to troubleshoot and support/drive issues to resolution, including root cause analysis.
SKILLS:
  • Motivated, friendly, committed, and energetic self-starter, dedicated to providing high quality and responsive IT services.
  • Excellent organization, documentation, time management and prioritization skills to manage multiple projects, locations, and technology needs.
  • Ability to maintain problem oversight and manage multiple simultaneous project tasks, prioritizing demands across functional work areas.
  • Ability to establish a practical working knowledge of CSHL business processes, interacting with key users to recommend solutions that best meet the strategic needs.
  • Has a mindset to improve standards, simplify, enhance functionality and/or transition to solutions to improve supportability.

Salary - 90-100K

#tech
IND123
#LI-KD1

  • Cold Spring Harbor, New York, United States Crossfire Consulting Full time

    Data Science Infrastructure Engineer PositionCrossfire Consulting is looking for a skilled Data Science Infrastructure Engineer to enhance our HPC capabilities in Cold Spring Harbor, NY.Job Description:As a key member of our technology organization, you will be responsible for optimizing and maintaining our NVIDIA GPU-based high-performance computing...


  • Cold Spring Harbor, New York, United States Crossfire Consulting Full time

    **Job Title:** HPC Engineer**Estimated Salary:** $120,000 - $180,000 per yearJob SummaryWe are seeking an exceptional HPC Engineer to join our team at Crossfire Consulting. As a key member of our technical team, you will be responsible for designing and implementing optimizations for code running on GPU-based HPC clusters, collaborating with research teams...


  • Cold Spring Harbor, New York, United States Crossfire Consulting Full time

    **Job Title:** HPC Engineer**Estimated Salary:** $120,000 - $180,000 per yearOverview of the RoleThis exciting opportunity involves optimizing and maintaining our cutting-edge NVIDIA GPU-based high-performance computing infrastructure. As an HPC Engineer, you will work closely with our technical teams to maximize computational efficiency and drive innovation...


  • Cold Spring Harbor, New York, United States Crossfire Consulting Full time

    HPC Specialist OpportunityWe are seeking an experienced High Performance Computing Specialist to join our team at Crossfire Consulting in Cold Spring Harbor, NY.Job Summary:Design and implement optimizations for code running on GPU-based HPC clustersMaintain and troubleshoot high-performance computing infrastructureCollaborate with research teams to optimize...


  • Spring House, United States Disability Solutions Full time

    Johnson and Johnson is currently seeking an Lead Software Engineer located in Spring House, PA or Raritan, NJ.\rAt Johnson & Johnson, we believe health is everything. Our strength in healthcare innovation empowers us to build a world where complex diseases are prevented, treated, and cured, where treatments are smarter and less invasive, and solutions are...