Senior HPC Infrastructure Engineer
3 days ago
We are seeking a highly skilled Senior HPC Infrastructure Engineer to join our team at St. Jude Children's Research Hospital. As a key member of our infrastructure team, you will play a critical role in designing, implementing, and optimizing our high-performance computing (HPC) clusters and servers.
Key Responsibilities- Lead HPC System Architecture: Design and implement advanced HPC systems to support cutting-edge research projects.
- Infrastructure Operations: Oversee the ongoing monitoring, support, and maintenance of our HPC clusters, ensuring peak performance and reliability.
- System Upgrades and Customization: Drive system upgrades, customization, and seamless integration with database administrators, software developers, network operations, and data center teams.
- Infrastructure Management: Manage and maintain a diverse range of computer systems and application software, ensuring they meet the highest standards of functionality and efficiency.
- Continuous Support and Monitoring: Ensure continuous support and monitoring of our research computing infrastructure, delivering exceptional service 24/7.
- Opportunity to Work with Cutting-Edge Technology: Collaborate with top-tier professionals across various disciplines to advance the field of HPC.
- Impact on Groundbreaking Research: Directly contribute to the success of research projects that aim to cure pediatric catastrophic diseases.
- Collaborative Environment: Work in a dynamic, collaborative environment that values diversity, equity, and inclusion.
- Education: Bachelor's degree in Computer Science, Engineering, Business, or related field of study required; Master's degree preferred.
- Experience: Minimum four (4) years of IT experience with experience in infrastructure operations and engineering environments.
- Technical Skills: Experience with Red Hat Enterprise Linux (RHEL), Linux in HPC environments, HPC cluster management, Slurm, LSF, Kubernetes, IBM Spectrum Scale (GPFS), Message Passing Interface (MPI), InfiniBand, Ethernet, and TCP/IP networking, and NVIDIA GPUs.
A reasonable estimate of the current salary range for this role is $94,640 - $169,520 per year.
-
Lead HPC Systems Engineer
1 week ago
Remote, Oregon, United States St. Jude Children's Research Hospital Full timeOverview: As a pivotal member of our innovative team, the Senior HPC Infrastructure Engineer will be instrumental in advancing our high-performance computing (HPC) and artificial intelligence (AI) frameworks. This role focuses on the design, implementation, and enhancement of our sophisticated HPC clusters and servers, ensuring optimal performance and...
-
Lead HPC Systems Engineer
2 weeks ago
Remote, Oregon, United States St. Jude Children's Research Hospital Full timeAs a Senior HPC Infrastructure Engineer, you will be instrumental in advancing the capabilities of high-performance computing (HPC) and artificial intelligence (AI) infrastructure. Your role will involve the strategic design, execution, and enhancement of our sophisticated HPC clusters and servers, ensuring optimal performance and reliability in our...
-
Lead HPC Systems Engineer
2 weeks ago
Remote, Oregon, United States St. Jude Children's Research Hospital Full timeOverview: As a Senior HPC Infrastructure Engineer at St. Jude Children's Research Hospital, you will be instrumental in advancing our high-performance computing (HPC) and artificial intelligence (AI) infrastructure. Your role will focus on the design, implementation, and optimization of our sophisticated HPC clusters and servers, ensuring that our research...
-
Staff Software Engineer, SQL Query Engine
2 months ago
Remote, Oregon, United States bodo Full timeAt Bodo, we are driven by a mission to revolutionize how organizations harness the power of data by democratizing efficient compute at scale. With the creation of the first compute engine that brings HPC levels of performance and efficiency to large-scale data processing, we have already helped some of the most data-forward companies in the world with their...
-
Principal Software Engineer, AI Platform
1 month ago
Remote, Oregon, United States General Motors Full timeDescriptionWe are looking for a technical expert to join our team and enhance the robustness and scalability of the infrastructure to support scaling our Machine Learning workloads. This role will involve working across various areas, from enhancing underlying HPC infrastructure to optimizing Kubernetes and Kubeflow setups, as well as refining training...
-
Senior Software Engineer, Infrastructure
2 months ago
Remote, Oregon, United States ngrok Full timeAbout ngrok Inc.ngrok empowers developers to build for the internet. This involves challenging problem-solving around networking, reliability, and performance. We build tools for engineers in nearly every Fortune 500 company and are expanding our offerings targeted at production workloads and use cases. And our customers love us: Our employees are low-ego,...
-
Senior DevOps Engineer
1 month ago
Remote, Oregon, United States Nillion Full timeDescriptionNillion is humanity's first Blind Computer. It is powered by a decentralized network of nodes that enables "Blind Computation" through the coordination and orchestration of privacy enhancing technologies (PETs) such as multi-party computation (MPC) and homomorphic encryption (HE). Nillion believes Blind Computation will become the internet's base...
-
Senior Devops Engineer
4 weeks ago
Remote, Oregon, United States CM Group Full timeThe Company:Marigold helps brands foster customer relationships through the science and art of connection. Marigold Relationship Marketing is a suite of world-class martech solutions that help marketers create long term customer love and loyalty. Marigold provides the most comprehensive set of use cases for marketers at any level. Headquartered in Nashville,...
-
Engineering Manager, Workflow Infrastructure
2 days ago
Remote, Oregon, United States Netflix Full timeAbout the RoleWe are seeking an experienced Engineering Manager to lead our Workflow Infrastructure team within the Content MiddleWare Infrastructure organization at Netflix. This team builds highly scalable and reliable infrastructure to enable business workflow development within CE/Studio applications.Key ResponsibilitiesLead a team of software engineers...
-
Senior DevOps Engineer
1 month ago
Remote, Oregon, United States Level Access Full timeThe Level Access DevOps team is responsible for automating, optimizing, and monitoring cloud infrastructure resources to support our engineering teams and provide the best possible experience to our customers. As a member of our team, your expertise will directly impact the efficiency of our software development process, ultimately contributing to the...
-
Senior Engineer, IT Infrastructure, Network
3 months ago
Remote, Oregon, United States Shiseido Full timeJob Summary: The position requires significant knowledge of best practices that fit the Shiseido IT organization, specifically in the infrastructure and system admin space. The individual collaborates closely with the IT organization to provide operations support, monitoring, troubleshooting and upgrades to meet overall business IT infrastructure needs. This...
-
Cloud Infrastructure Specialist
1 week ago
Remote, Oregon, United States Xpertbizz Full timeJob Overview:Essential Qualifications:Proficiency in Nutanix and vmWare vRA (vRealize Automation),Experience with CALMKnowledge of Ansible or PythonThis position is for a Senior Software Engineer focused on infrastructure management and automation. The primary requirement is expertise in vRA and Calm, while other skills are considered supplementary.Key...
-
Cloud Infrastructure Specialist
2 weeks ago
Remote, Oregon, United States Xpertbizz Full timeJob Overview:Essential Qualifications:Proficiency in Nutanix and VMware vRA (vRealize Automation),Experience with CALMKnowledge of Ansible or PythonThis position is for a Senior Software Engineer focusing on infrastructure management and automation. The primary requirement is expertise in vRA and CALM, while other skills are considered supplementary.Key...
-
Senior Site Reliability Engineer
1 month ago
Remote, Oregon, United States Business Wire Full timeBusiness Wire, a Berkshire Hathaway company, is the global market leader in press release distribution and regulatory disclosure. We are on a mission to redefine how organizations connect with their audiences - and that's just the beginningOrganizations, large and small, depend on us to accurately publicize market-moving news and multimedia, and generate...
-
Senior Software Engineer
3 days ago
Remote, Oregon, United States Equinix Full timeAbout UsEquinix is a leading digital infrastructure company, operating over 250 data centers across the globe. We enable digital leaders to bring together and interconnect foundational infrastructure at software speed, scaling with agility and delivering world-class experiences.Our CultureWe value collaboration, growth, and development of our teams. We hire...
-
Senior Site Reliability Engineer
2 months ago
Remote, Oregon, United States Hypori Inc. Full timeHypori Inc, a leading provider of SaaS cybersecurity solutions, is transforming secure mobility for federal and commercial customers, including the United States Army. Hypori's secure virtual workspace enables users to access critical data and apps from any mobile device without compromising user privacy. From commercial IP to national security level intel,...
-
Senior Software Engineer
7 hours ago
Remote, Oregon, United States Equinix Full timeAbout UsEquinix is a leading digital infrastructure company, operating over 250 data centers across the globe. We enable digital leaders to bring together and interconnect foundational infrastructure at software speed, scaling with agility and delivering world-class experiences.Our CultureWe value collaboration and the growth and development of our teams. We...
-
Staff Software Engineer, Network Infrastructure
1 month ago
Remote, Oregon, United States Starburst Full timeAbout StarburstAt Starburst, we are working to dismantle the status quo of data silos and vendor lock-in every single day. For decades, database companies have held their customers hostage and we believe that's just plain wrong. Starburst offers a full-featured data lake analytics platform, built on open source Trino. Our platform includes all the...
-
Senior Platform Engineer
2 months ago
Remote, Oregon, United States Glide (glideapps) Full timeGlide is looking for a Senior Platform Engineer to help evolve our data sources strategy, become more efficient with our infrastructure utilization as we scale, and achieve a high level of operational excellence for availability, security, and performance.The ideal candidate has a specialty in provisioning and managing large-scale, persistent, data sources...
-
Senior Cloud Engineer
2 months ago
Remote, Oregon, United States Effectual Full timePosition SummaryEffectual Cloud Engineers (CEs) are members of the Public Sector Program Management team responsible for ensuring that customer-facing projects are delivered with exceptional customer satisfaction and technical excellence. Effectual CEs are "Brand Ambassadors" and are expected to stay current on leading practices to deliver high-quality,...