Senior HPC Infrastructure Engineer

3 days ago


Remote, Oregon, United States St. Jude Children's Research Hospital Full time
About the Role

We are seeking a highly skilled Senior HPC Infrastructure Engineer to join our team at St. Jude Children's Research Hospital. As a key member of our infrastructure team, you will play a critical role in designing, implementing, and optimizing our high-performance computing (HPC) clusters and servers.

Key Responsibilities
  • Lead HPC System Architecture: Design and implement advanced HPC systems to support cutting-edge research projects.
  • Infrastructure Operations: Oversee the ongoing monitoring, support, and maintenance of our HPC clusters, ensuring peak performance and reliability.
  • System Upgrades and Customization: Drive system upgrades, customization, and seamless integration with database administrators, software developers, network operations, and data center teams.
  • Infrastructure Management: Manage and maintain a diverse range of computer systems and application software, ensuring they meet the highest standards of functionality and efficiency.
  • Continuous Support and Monitoring: Ensure continuous support and monitoring of our research computing infrastructure, delivering exceptional service 24/7.
What We Offer
  • Opportunity to Work with Cutting-Edge Technology: Collaborate with top-tier professionals across various disciplines to advance the field of HPC.
  • Impact on Groundbreaking Research: Directly contribute to the success of research projects that aim to cure pediatric catastrophic diseases.
  • Collaborative Environment: Work in a dynamic, collaborative environment that values diversity, equity, and inclusion.
Requirements
  • Education: Bachelor's degree in Computer Science, Engineering, Business, or related field of study required; Master's degree preferred.
  • Experience: Minimum four (4) years of IT experience with experience in infrastructure operations and engineering environments.
  • Technical Skills: Experience with Red Hat Enterprise Linux (RHEL), Linux in HPC environments, HPC cluster management, Slurm, LSF, Kubernetes, IBM Spectrum Scale (GPFS), Message Passing Interface (MPI), InfiniBand, Ethernet, and TCP/IP networking, and NVIDIA GPUs.
Compensation

A reasonable estimate of the current salary range for this role is $94,640 - $169,520 per year.



  • Remote, Oregon, United States St. Jude Children's Research Hospital Full time

    Overview: As a pivotal member of our innovative team, the Senior HPC Infrastructure Engineer will be instrumental in advancing our high-performance computing (HPC) and artificial intelligence (AI) frameworks. This role focuses on the design, implementation, and enhancement of our sophisticated HPC clusters and servers, ensuring optimal performance and...


  • Remote, Oregon, United States St. Jude Children's Research Hospital Full time

    As a Senior HPC Infrastructure Engineer, you will be instrumental in advancing the capabilities of high-performance computing (HPC) and artificial intelligence (AI) infrastructure. Your role will involve the strategic design, execution, and enhancement of our sophisticated HPC clusters and servers, ensuring optimal performance and reliability in our...


  • Remote, Oregon, United States St. Jude Children's Research Hospital Full time

    Overview: As a Senior HPC Infrastructure Engineer at St. Jude Children's Research Hospital, you will be instrumental in advancing our high-performance computing (HPC) and artificial intelligence (AI) infrastructure. Your role will focus on the design, implementation, and optimization of our sophisticated HPC clusters and servers, ensuring that our research...


  • Remote, Oregon, United States bodo Full time

    At Bodo, we are driven by a mission to revolutionize how organizations harness the power of data by democratizing efficient compute at scale. With the creation of the first compute engine that brings HPC levels of performance and efficiency to large-scale data processing, we have already helped some of the most data-forward companies in the world with their...


  • Remote, Oregon, United States General Motors Full time

    DescriptionWe are looking for a technical expert to join our team and enhance the robustness and scalability of the infrastructure to support scaling our Machine Learning workloads. This role will involve working across various areas, from enhancing underlying HPC infrastructure to optimizing Kubernetes and Kubeflow setups, as well as refining training...


  • Remote, Oregon, United States ngrok Full time

    About ngrok Inc.ngrok empowers developers to build for the internet. This involves challenging problem-solving around networking, reliability, and performance. We build tools for engineers in nearly every Fortune 500 company and are expanding our offerings targeted at production workloads and use cases. And our customers love us: Our employees are low-ego,...


  • Remote, Oregon, United States Nillion Full time

    DescriptionNillion is humanity's first Blind Computer. It is powered by a decentralized network of nodes that enables "Blind Computation" through the coordination and orchestration of privacy enhancing technologies (PETs) such as multi-party computation (MPC) and homomorphic encryption (HE). Nillion believes Blind Computation will become the internet's base...


  • Remote, Oregon, United States CM Group Full time

    The Company:Marigold helps brands foster customer relationships through the science and art of connection. Marigold Relationship Marketing is a suite of world-class martech solutions that help marketers create long term customer love and loyalty. Marigold provides the most comprehensive set of use cases for marketers at any level. Headquartered in Nashville,...


  • Remote, Oregon, United States Netflix Full time

    About the RoleWe are seeking an experienced Engineering Manager to lead our Workflow Infrastructure team within the Content MiddleWare Infrastructure organization at Netflix. This team builds highly scalable and reliable infrastructure to enable business workflow development within CE/Studio applications.Key ResponsibilitiesLead a team of software engineers...


  • Remote, Oregon, United States Level Access Full time

    The Level Access DevOps team is responsible for automating, optimizing, and monitoring cloud infrastructure resources to support our engineering teams and provide the best possible experience to our customers. As a member of our team, your expertise will directly impact the efficiency of our software development process, ultimately contributing to the...


  • Remote, Oregon, United States Shiseido Full time

    Job Summary: The position requires significant knowledge of best practices that fit the Shiseido IT organization, specifically in the infrastructure and system admin space. The individual collaborates closely with the IT organization to provide operations support, monitoring, troubleshooting and upgrades to meet overall business IT infrastructure needs. This...


  • Remote, Oregon, United States Xpertbizz Full time

    Job Overview:Essential Qualifications:Proficiency in Nutanix and vmWare vRA (vRealize Automation),Experience with CALMKnowledge of Ansible or PythonThis position is for a Senior Software Engineer focused on infrastructure management and automation. The primary requirement is expertise in vRA and Calm, while other skills are considered supplementary.Key...


  • Remote, Oregon, United States Xpertbizz Full time

    Job Overview:Essential Qualifications:Proficiency in Nutanix and VMware vRA (vRealize Automation),Experience with CALMKnowledge of Ansible or PythonThis position is for a Senior Software Engineer focusing on infrastructure management and automation. The primary requirement is expertise in vRA and CALM, while other skills are considered supplementary.Key...


  • Remote, Oregon, United States Business Wire Full time

    Business Wire, a Berkshire Hathaway company, is the global market leader in press release distribution and regulatory disclosure. We are on a mission to redefine how organizations connect with their audiences - and that's just the beginningOrganizations, large and small, depend on us to accurately publicize market-moving news and multimedia, and generate...


  • Remote, Oregon, United States Equinix Full time

    About UsEquinix is a leading digital infrastructure company, operating over 250 data centers across the globe. We enable digital leaders to bring together and interconnect foundational infrastructure at software speed, scaling with agility and delivering world-class experiences.Our CultureWe value collaboration, growth, and development of our teams. We hire...


  • Remote, Oregon, United States Hypori Inc. Full time

    Hypori Inc, a leading provider of SaaS cybersecurity solutions, is transforming secure mobility for federal and commercial customers, including the United States Army. Hypori's secure virtual workspace enables users to access critical data and apps from any mobile device without compromising user privacy. From commercial IP to national security level intel,...


  • Remote, Oregon, United States Equinix Full time

    About UsEquinix is a leading digital infrastructure company, operating over 250 data centers across the globe. We enable digital leaders to bring together and interconnect foundational infrastructure at software speed, scaling with agility and delivering world-class experiences.Our CultureWe value collaboration and the growth and development of our teams. We...


  • Remote, Oregon, United States Starburst Full time

    About StarburstAt Starburst, we are working to dismantle the status quo of data silos and vendor lock-in every single day. For decades, database companies have held their customers hostage and we believe that's just plain wrong. Starburst offers a full-featured data lake analytics platform, built on open source Trino. Our platform includes all the...


  • Remote, Oregon, United States Glide (glideapps) Full time

    Glide is looking for a Senior Platform Engineer to help evolve our data sources strategy, become more efficient with our infrastructure utilization as we scale, and achieve a high level of operational excellence for availability, security, and performance.The ideal candidate has a specialty in provisioning and managing large-scale, persistent, data sources...

  • Senior Cloud Engineer

    2 months ago


    Remote, Oregon, United States Effectual Full time

    Position SummaryEffectual Cloud Engineers (CEs) are members of the Public Sector Program Management team responsible for ensuring that customer-facing projects are delivered with exceptional customer satisfaction and technical excellence. Effectual CEs are "Brand Ambassadors" and are expected to stay current on leading practices to deliver high-quality,...