HPC System Admin

4 months ago


Austin, United States NR Consulting Full time

Job Title: HPC System Admin
Work Location: Austin, TX
Position Type: Contract with possible extension
Duration: 12 + Months

Job Description:
Project Details:
Responsible for architecting and implementing Linux High Performance Computing (HPC) clusters. Performs system architecture duties on a Linux High performance computing (HPC) cluster including cluster management, virtualization, cluster usage monitoring, health monitoring, job scheduling, application integration/installation (open source as well as vendor supported), and application performance. Improve cluster performance through kernel changes, firmware updates, library stack changes, and application container management such as docker.

Mandatory Skills and Technologies, framework, and Methodologies:
Knowledge of Linux and UNIX operating systems, including scripting and programming proficiencies.
Experience with cloud bursting technologies.
Knowledge of cloud services like AWS SCOCA, Parallel Cluster, and Azure CycleCloud
Knowledge of HPC tools and storage: AWS Elastic Fabric Adapter, Azure ANF, Apache Spark, or Apache Ignite, Lustre, BeeFS
Demonstrate experience in programming system maintenance tasks in C, Java, Perl, batch/shell, or another general-purpose programming language.
Knowledge of NUMA and understanding of NUMA related APIs.
Be able to perform complex performance analysis including system processes, I/O subsystems, networks and other related components.
Must have experience with multi-threading and parallel processing tools and environments.
Must have experience as a systems administrator. Must have advanced ability to analyze complex IT systems.
Experience with high-performance servers and associated high-performance networks.
Experience installing and maintaining clustered environments, including automated installation methods.
Knowledge of common server hardware architectures including servers (CPU, bus, memory), SANS, disk arrays, network hardware.
Understanding of Red Hat Linux Operating system including processes, files, memory management and I/O systems; networking services and protocols (e.g., TCP/IP, SSL, FTP, Telnet, LDAP).
Understanding of IP networking, basic routing, TCP ports and network services, including SSH, LDAP, SFTP and HTTP(S). Ability to design, promote, and implement change control and configuration management, patch management, high availability systems, structured design and support methodologies.
Must be organized with a strong ability to deliver tasks on time, manage multiple efforts and be able to work with minimal supervision.
Demonstrated ability to proactively learn, adapt to and use new hardware/software technologies.

Good to have skills, Technologies, framework, and Methodologies
Performs system administration duties on a linux HPC Cluster, cluster management, virtualization, cluster usage monitoring, health monitoring, job scheduling, and application integration/installation.
Responsible for system implementation/integration and systems performance analysis.
Manages hardware and software applications in the production environment provided to HPC users.
Install software and updates
Coordinates with vendors to resolve hardware and software problems in HPC Cluster.
Facilitates the acquisition of hardware and software products and services for the HPC Cluster.
Knowledge of LSF or other open-source job schedulers.
Compile, configure, and integrate open source applications into HPC environment.
ble to learn and use internal software systems.
Monitors the availability of patches and updates and evaluates the importance to the environment and schedules installations accordingly.
Keeps abreast of the latest HPC hardware and software technology, evaluating technologies as needed.
Designs, implements and administers high performance computing cluster, performing proof of concepts such as software containers (ex. Docker).
Interacts effectively with a broad range of colleagues such as Applied Materials researchers and other IT staff.
Other duties may be assigned.



  • Austin, Texas, United States Advanced Micro Devices, Inc Full time

    About the RoleWe are seeking a highly skilled and experienced HPC Solutions Architect and System Engineer to join our team at Advanced Micro Devices, Inc. This is a key position that will play a critical role in driving the success of our Data Center GPU organization.Key ResponsibilitiesDrive technical innovation to improve AMD's capabilities across...


  • Austin, Texas, United States Advanced Micro Devices, Inc Full time

    About the RoleWe are seeking a highly skilled and experienced High-Performance Computing Solutions Architect and System Engineer to join our team at Advanced Micro Devices, Inc. (AMD). As a key member of our organization, you will play a critical role in driving the development of cutting-edge technologies that accelerate next-generation computing...


  • Austin, Texas, United States Advanced Micro Devices , Inc. Full time

    Job SummaryWe are seeking a highly skilled Principal Engineer, HPC EDA Infrastructure GRID to join our team at Advanced Micro Devices, Inc. This role will play a critical part in establishing and maintaining our technological leadership position in HPC EDA infrastructure for the semiconductor industry.About the RoleThe successful candidate will represent AMD...


  • Austin, United States NXP Semiconductors Full time

    Senior HPC/LDAP Administrator - R&D HPC team - Austin, USThis is what you will do as HPC DevOps engineer at NXPYou are expected to work very closely with your global colleagues within R&D IT and help deliver the LDAP & HPC services (High Performance Computing and Virtual Desktop Infrastructure) to our engineering and R&D customers. Your AMEC team has...


  • Austin, United States NXP Semiconductors Full time

    Senior HPC/LDAP Administrator - R&D HPC team - Austin, USThis is what you will do as HPC DevOps engineer at NXPYou are expected to work very closely with your global colleagues within R&D IT and help deliver the LDAP & HPC services (High Performance Computing and Virtual Desktop Infrastructure) to our engineering and R&D customers. Your AMEC team has...

  • Senior Storage

    3 weeks ago


    Austin, United States NXP Semiconductors Full time

    Job DescriptionSenior Storage & HPC administrator - R&D HPC team - Austin, USThis is what you will do as HPC DevOps engineer at NXPYou are expected to work very closely with your global colleagues within R&D IT and help deliver the Storage & HPC services (High Performance Computing and Virtual Desktop Infrastructure) to our engineering and R&D customers....

  • Senior Storage

    2 weeks ago


    Austin, United States NXP Semiconductors Full time

    Job DescriptionSenior Storage & HPC administrator - R&D HPC team - Austin, USThis is what you will do as HPC DevOps engineer at NXPYou are expected to work very closely with your global colleagues within R&D IT and help deliver the Storage & HPC services (High Performance Computing and Virtual Desktop Infrastructure) to our engineering and R&D customers....


  • Austin, Texas, United States Advanced Micro Devices, Inc Full time

    About the RoleAdvanced Micro Devices, Inc. is seeking a highly skilled Platform Emulation Engineer to join our Data Center GPU organization. As a key member of our team, you will be responsible for developing and maintaining emulation infrastructure, executing and debugging critical firmware used in our HPC and ML products, and collaborating with...


  • Austin, Texas, United States Advanced Micro Devices , Inc. Full time

    OverviewAt Advanced Micro Devices, Inc., we're transforming lives with our technology to enrich our industry, communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences, the building blocks for the data center, artificial intelligence, PCs, gaming, and embedded systems.Our culture is built on...


  • Austin, Texas, United States Advanced Micro Devices , Inc. Full time

    About the RoleWe are seeking a skilled Platform Emulation Engineer to join our Data Center GPU organization at Advanced Micro Devices, Inc. This role will be responsible for developing and maintaining emulation infrastructure, executing and debugging critical firmware used in our HPC and ML products, and collaborating with cross-functional teams to ensure...


  • Austin, United States Hudson River Trading Full time

    The Research & Development team at Hudson River Trading (HRT) builds and maintains the computers, networks, data storage, operating systems, and software that allow our trading strategies and research environment to operate worldwide 24/7. We are looking for an experienced Storage Engineer who enjoys being challenged, appreciates an open and collaborative...


  • Austin, Texas, United States Advanced Micro Devices, Inc Full time

    About the RoleWe are seeking a highly skilled Platform Emulation Engineer to join our Data Center GPU organization at Advanced Micro Devices, Inc. This is an exciting opportunity to work on bleeding-edge SoC architecture and technology, participating in the development of emulation infrastructure to enable pre-silicon activities and ensure high-quality...


  • Austin, Texas, United States Advanced Micro Devices, Inc Full time

    About the RoleWe are seeking a highly skilled GPU Architect to join our team at Advanced Micro Devices, Inc. as an AI - GPU Systems Architect. This is a critical role that will involve designing and developing cutting-edge accelerated computing platforms.Key ResponsibilitiesLead the development of complex platform architectures from component to rack scale...

  • Kafka Admin

    1 week ago


    Austin, United States Tekgence Inc Full time

    Kafka Admin EngineerRequired Skills : Kafka Admin,KafkaExperience : 6to10YrsJob summaryWe are seeking a highly skilled Kafka Admin Engineer with 6 to 10 years of experience to join our team as a Business Associate.The ideal candidate will have extensive experience with Kafka and KafkaAdmin.This role involves designing, implementing, and maintaining robust...

  • Kafka Admin

    1 week ago


    Austin, United States Tekgence Inc Full time

    Note - ***Candidates must be local and able to take a F2F client interviewKafka Admin EngineerRequired Skills : Kafka Admin,KafkaExperience : 6to10Yrs Job summary We are seeking a highly skilled Kafka Admin Engineer with 6 to 10 years of experience to join our team as a Business Associate. The ideal candidate will have extensive experience with Kafka and...

  • Kafka Admin

    1 week ago


    Austin, United States Tekgence Inc Full time

    Note - ***Candidates must be local and able to take a F2F client interviewKafka Admin EngineerRequired Skills : Kafka Admin,KafkaExperience : 6to10Yrs Job summary We are seeking a highly skilled Kafka Admin Engineer with 6 to 10 years of experience to join our team as a Business Associate. The ideal candidate will have extensive experience with Kafka and...


  • Austin, Texas, United States Advanced Micro Devices, Inc Full time

    About the RoleWe are seeking a highly skilled GPU Architect to join our team at Advanced Micro Devices, Inc. as an AI - GPU Systems Architect. This is a critical role that will involve designing and developing cutting-edge accelerated computing platforms.Key ResponsibilitiesLead the development of complex platform architectures from component to rack scale...


  • Austin, Texas, United States Advanced Micro Devices , Inc. Full time

    JOIN AMD AND MAKE A DIFFERENCEAt AMD, we are committed to revolutionizing lives through our innovative technology, enhancing our industry, communities, and the globe. Our goal is to create exceptional products that propel next-generation computing experiences, serving as the foundation for data centers, artificial intelligence, personal computing, gaming,...


  • Austin, Texas, United States Advanced Micro Devices, Inc Full time

    About the RoleAs an AI - GPU Systems Architect at Advanced Micro Devices, Inc., you will be responsible for creating the company's future accelerated computing platforms. This role requires strong technical expertise, excellent communication skills, and the ability to lead multi-discipline teams in platform architecture and definition.Key...

  • Kafka Admin Engineer

    2 weeks ago


    Austin, United States Simple Solutions Full time

    Kafka Admin Engineer***Onsite 5 days in in Austin TX***Candidates must be local and able to take a F2F client interviewRequired Skills : Kafka AdminKafkaJob summaryWe are seeking a highly skilled Kafka Admin Engineer with 6 to 10 years of experience to join our team as a Business Associate.The ideal candidate will have extensive experience with Kafka and...