Senior HPC Infrastructure Engineer
4 weeks ago
As a Senior HPC Infrastructure Engineer at St. Jude Children's Research Hospital, you will play a pivotal role in designing, implementing, and optimizing our state-of-the-art HPC clusters and servers. Your expertise will ensure that our research computing environment excels in scalability, redundancy, and performance.
Key Responsibilities
- Lead the architecture, design, and implementation of advanced HPC/AI systems to support groundbreaking research.
- Oversee the ongoing monitoring, support, and maintenance of our HPC/AI clusters, ensuring peak performance and reliability.
- Drive system upgrades, customization, and seamless integration with database administrators, software developers, network operations, and data center teams.
- Manage and maintain a diverse range of computer systems and application software, ensuring they meet the highest standards of functionality and efficiency.
- Ensure continuous support and monitoring of our research computing infrastructure, delivering exceptional service 24/7.
What We Offer
- An opportunity to work with cutting-edge technology in a dynamic, collaborative environment.
- A role that directly impacts the success of groundbreaking research projects.
- A chance to collaborate with top-tier professionals across various disciplines.
Requirements
- Bachelor's degree in Computer Science, Engineering, Business or related field of study required.
- Minimum experience: Four (4) years of IT experience with experience in infrastructure operations and engineering environments.
- Experience with Red Hat Enterprise Linux (RHEL) is highly preferred.
- Experience with using and supporting Linux in a high-performance computing (HPC) cluster and research computing environment is highly preferred.
- Must have experience managing an HPC cluster.
- Experience with Slurm and/or LSF is highly preferred.
- Experience with Kubernetes (e.g., Rancher, OpenShift, etc.) is a plus.
- Experience with Base Command Manager, Bright Cluster Manager, or another HPC cluster manager (e.g., HPCM, xCAT, Warewulf, Scyld) is highly preferred.
- Experience with IBM Spectrum Scale (GPFS) is required; experience with Lustre is a plus.
- Experience with Message Passing Interface (MPI) is highly preferred.
- Experience with InfiniBand, Ethernet, and TCP/IP networking and topology is highly preferred.
- Experience with HPE Aruba Ethernet switches is preferred.
- Experience with NVIDIA GPUs is required; experience with AMD GPUs is a plus.
- Experience with NVIDIA GPUDirect Storage is a plus.
- Advanced knowledge and strong understanding of in-depth HPC technologies and principals.
- Must have strong knowledge of Linux security and Linux shell scripting.
- Proven performance in earlier role/comparable role.
-
Senior Controls Engineer
4 weeks ago
Millington, Tennessee, United States Zobility Full timeJob Summary:As a Senior Controls Engineer at Zobility, you will be responsible for leading a team and designing control systems for manufacturing equipment. Your expertise in control panel concept and design, as well as your ability to develop documentation, will be essential in ensuring the workability, completeness, and accuracy of all designs.Key...
-
Senior Network Infrastructure Specialist
4 weeks ago
Millington, Tennessee, United States Saxon Global Full timeJob DescriptionAt Saxon Global, we are seeking a highly skilled Cisco Network Engineer to join our team. The ideal candidate will have a strong background in network architecture, with a focus on Cisco systems. Key responsibilities include troubleshooting network hardware issues, documenting standard processes, and providing technical assistance to onsite...
-
Warehouse Operations Manager
4 weeks ago
Millington, Maryland, United States Avail Infrastructure Solutions Full timeAbout Avail Infrastructure SolutionsWe provide highly engineered technologies, application-critical equipment, and specialized services in the electrical and welding fields. Our company operates through a network of 15 strategically located manufacturing facilities across the globe, employing over 1,750 employees.Our team at Avail Enclosure Systems, an Avail...
-
Field Engineer III
3 weeks ago
Millington, United States MasTec Full timeOverview: Position Overview: Responsible for monitoring multiple project activities, including inspections of major equipment received onsite, compliance with specifications, procedures, drawings, submittals and ensuring inspections and tests are qualified and complete. Employee must develop a thorough understanding of the entire project scope and schedule....
-
Field Engineer III
3 weeks ago
Millington, United States MasTec Full timeOverview: Position Overview: Responsible for monitoring multiple project activities, including inspections of major equipment received onsite, compliance with specifications, procedures, drawings, submittals and ensuring inspections and tests are qualified and complete. Employee must develop a thorough understanding of the entire project scope and schedule....
-
Electronic Assembler
2 weeks ago
Millington, MD, United States Avail Infrastructure Solutions Full timeNow accepting applications for all levels - training provided based on skills. Full-time and part-time positions available. Up to $18/hour starting with pay-for-performance up to $25/hour.About UsFrom the harshest locations to the most unique operating environments, our custom-engineered enclosure systems are a cost-effective, plug and play solution that...
-
Mechanical Intern
4 weeks ago
Millington, United States Smith Seckman Reid Full timeDescription A leading comprehensive engineering design and consulting firm, SSR provides innovative solutions for clients with facility and infrastructure challenges. To achieve client needs, SSR has multiple locations across the US. Working with a diverse group of individuals in a variety of markets, our team of experts partner with our clients to deliver...