Network High Performance Computing Engineering

3 weeks ago


Seattle, United States E-Solutions Full time

Job Title: Network High Performance Computing Engineering

Onsite in Seattle 4 days/week.


The Client Prefers only Local consultant


Key Responsibilities:


  • Designing and deploying HPC clusters consisting of high-performance servers, interconnected by high-speed networks such as InfiniBand (IB) or Ethernet/RoCE with RDMA capabilities.
  • InfiniBand Responsibilities:
  • Fabric Design and Configuration: Designing InfiniBand fabrics, including switches, host channel adapters (HCAs), and cables, to ensure optimal performance, scalability, and fault tolerance. Configuring switch ports, virtual lanes (VLs), and routing tables to facilitate efficient data communication within the InfiniBand fabric.
  • Topology Optimization: Analyzing workload characteristics and traffic patterns to design InfiniBand topologies (e.g., fat-tree, hypercube) that minimize latency and maximize bandwidth utilization. Implementing routing policies and congestion control mechanisms to optimize traffic flow and prevent network congestion.
  • Fabric Monitoring and Management: Monitoring InfiniBand fabric health and performance using management tools such as Subnet Manager (SM) and Performance Monitoring Counters (PMCs). Performing regular maintenance tasks, including firmware updates, port diagnostics, and error detection and correction.
  • Quality of Service (QoS): Implementing QoS policies to prioritize traffic based on application requirements and service levels. Configuring traffic classes, service levels, and virtual lanes (VLs) to ensure predictable performance for latency-sensitive applications.
  • Security and Access Control: Securing the InfiniBand fabric with features such as subnet partitioning (subnet manager security) and encryption to protect data integrity and confidentiality. Enforcing access controls and authentication 6 mechanisms to restrict unauthorized access to the InfiniBand network.
  • RoCE Responsibilities
  • Network Design and Configuration: Designing and configuring RoCE networks, including switches, network adapters, and Ethernet fabrics, to provide low-latency, high-bandwidth communication for RDMA traffic. Optimizing network settings such as MTU (Maximum Transmission Unit), buffer sizes, and flow control parameters to maximize RoCE performance.
  • Congestion Management: Implementing congestion management mechanisms, such as Priority Flow Control (PFC) and Data Center Bridging (DCB), to prevent congestion and ensure fair allocation of network resources. Monitoring network traffic and congestion levels to dynamically adjust congestion control settings and avoid performance degradation.
  • Routing and Switching Optimization: Configuring RoCE-aware switches and routers to support RDMA traffic and enable efficient routing of packets between endpoints. Tuning switch port settings, forwarding tables, and routing protocols to minimize packet loss and maximize throughput for RoCE traffic.
  • Performance Monitoring and Tuning: Monitoring RoCE network performance metrics, such as latency, throughput, and packet loss, using tools like Ethernet Performance Monitoring (EPM) and InfiniBand Performance Monitoring (IPM). Analyzing performance data to identify bottlenecks, optimize network configurations, and fine-tune RoCE parameters for optimal performance.
  • Security and Authentication: Implementing security measures, such as MACsec (Media Access Control Security) and IPsec (Internet Protocol Security), to encrypt and authenticate RDMA traffic over RoCE networks. Enforcing access controls and certificate-based authentication to ensure secure communication between RoCE endpoints.



Basic Qualification:

  • Minimum 3 years experience with InfiniBand architecture, protocols IBTA , and technologies e.g., Mellanox InfiniBand. Proficiency in RoCE RDMA over Converged Ethernet protocols, including RoCEv2 and related standards.
  • Minimum 3 years experience in designing and configuring high performance networks, including InfiniBand fabrics and RoCE enabled Ethernet networks. Knowledge of fabric design principles, topology optimization, and performance tuning techniques.
  • Minimum 3 years experience analyzing network performance metrics, diagnose bottlenecks, and optimize network configurations for low latency and high throughput. Experience in tuning switch port settings, buffer sizes, and flow control parameters to maximize RoCE performance.
  • Minimum 3 years experience with security measures for InfiniBand and RoCE networks, including subnet partitioning, encryption, and access controls. Knowledge of authentication mechanisms and cryptographic protocols for securing RDMA traffic.
  • Minimum 3 years experience with network monitoring tools and techniques for monitoring InfiniBand and RoCE network health and performance. Ability to troubleshoot network issues, diagnose connectivity problems, and resolve performance related issues.
  • Minimum 3 years hands on experience in deploying, managing, and optimizing high performance computing HPC environments and data center networks.
  • Minimum 3 years experience working with RDMA enabled applications and parallel computing frameworks e.g., MPI, OpenMP.
  • Minimum 3 years experience implementing and troubleshooting complex network configurations, including InfiniBand switches, gateways, and RoCE adapters.
  • High School Diploma or GED.


Preferred Qualification:


  • Previous experience with network equipment vendor products (e.g., Juniper, Cisco, Arista, OEM).
  • A solid educational foundation in computer science or IT is essential for understanding networking principles and protocols.
  • Certification programs offered by vendors such as Mellanox now NVIDIA Networking for InfiniBand and RoCE technologies.
  • Bachelor s degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 5 years.



  • seattle, United States Axiom Global Technologies Full time

    Basic Qualification:Minimum 3 years experience with InfiniBand architecture, protocols IBTA , and technologies e.g., Mellanox InfiniBand. Proficiency in RoCE RDMA over Converged Ethernet protocols, including RoCEv2 and related standards.Minimum 3 years experience in designing and configuring high performance networks, including InfiniBand fabrics and RoCE...


  • seattle, United States Axiom Global Technologies Full time

    Basic Qualification:Minimum 3 years experience with InfiniBand architecture, protocols IBTA , and technologies e.g., Mellanox InfiniBand. Proficiency in RoCE RDMA over Converged Ethernet protocols, including RoCEv2 and related standards.Minimum 3 years experience in designing and configuring high performance networks, including InfiniBand fabrics and RoCE...


  • Seattle, United States Axiom Global Technologies Full time

    Basic Qualification:Minimum 3 years experience with InfiniBand architecture, protocols IBTA , and technologies e.g., Mellanox InfiniBand. Proficiency in RoCE RDMA over Converged Ethernet protocols, including RoCEv2 and related standards.Minimum 3 years experience in designing and configuring high performance networks, including InfiniBand fabrics and RoCE...


  • Seattle, United States E-Solutions Full time

    Job Title: Network High Performance Computing Engineering Onsite in Seattle 4 days/week.The Client Prefers only Local consultantKey Responsibilities:Designing and deploying HPC clusters consisting of high-performance servers, interconnected by high-speed networks such as InfiniBand (IB) or Ethernet/RoCE with RDMA capabilities.InfiniBand...


  • Seattle, WA, United States Axiom Global Technologies Full time

    Basic Qualification:Minimum 3 years experience with InfiniBand architecture, protocols IBTA , and technologies e.g., Mellanox InfiniBand. Proficiency in RoCE RDMA over Converged Ethernet protocols, including RoCEv2 and related standards.Minimum 3 years experience in designing and configuring high performance networks, including InfiniBand fabrics and RoCE...


  • Seattle, Washington, United States E-Solutions Full time

    About E-Solutions:We are a leading provider of innovative solutions for high-performance computing and data center networks. Our team is dedicated to delivering exceptional results and pushing the boundaries of what is possible in the field of HPC.Job Overview:This role involves designing, deploying, and managing high-performance computing networks for our...


  • seattle, United States E-Solutions Full time

    Job Title: Network High Performance Computing Engineering Onsite in Seattle 4 days/week.The Client Prefers only Local consultantKey Responsibilities:Designing and deploying HPC clusters consisting of high-performance servers, interconnected by high-speed networks such as InfiniBand (IB) or Ethernet/RoCE with RDMA capabilities.InfiniBand...


  • Seattle, Washington, United States Amazon Full time

    Company OverviewAmazon, a leader in cloud computing, is committed to shaping the future of virtualized networking. Our vision is to combine the performance of bare metal networking with the benefits of the cloud.


  • Seattle, Washington, United States Amazon Full time

    We are re-engineering our virtual networking control plane to achieve near-bare-metal performance. This role involves building, scaling, and maintaining high-performance software to streamline VPC management.Job DescriptionThis is a challenging opportunity to own major deliverables and aspects of the development cycle: scoping, design, implementation, and...

  • Network Engineer

    3 weeks ago


    Seattle, United States Talent Groups Full time

    Position Title: Network Engineer (Junior and Senior Roles Available)Location: Mostly remote in Seattle WA (1-2 onsite days a month in the datacenter)Employment Type: 12 month contract Pay levels -Junior level (2-4 years) - up to $43/hourSenior level ( 7+ years) - up to $62/hourPosition Overview:We are seeking a skilled Network Engineer to support our Seattle...

  • Network Engineer

    3 weeks ago


    seattle, United States Talent Groups Full time

    Position Title: Network Engineer (Junior and Senior Roles Available)Location: Mostly remote in Seattle WA (1-2 onsite days a month in the datacenter)Employment Type: 12 month contract Pay levels -Junior level (2-4 years) - up to $43/hourSenior level ( 7+ years) - up to $62/hourPosition Overview:We are seeking a skilled Network Engineer to support our Seattle...


  • Seattle, Washington, United States Amazon Full time

    OverviewAWS is seeking a highly skilled Network Development Engineer to join our Network Fabric Engineering (NFE) team. As a key member of this team, you will be responsible for designing, deploying, and scaling Amazon networks that support AWS, customers, and other business units across multiple global datacenters.Key ResponsibilitiesThe Network Development...


  • Seattle, Washington, United States Amazon Full time

    The Amazon Web Services (AWS) Global Demand and Operations team is seeking a seasoned technical expert to spearhead the development of high-quality, scalable systems that meet business needs.As an AWS Software Engineer, you will serve as a technical leader on cross-functional projects, ensuring the quality of architecture and design of systems. Your...


  • Seattle, Washington, United States Ll Oefentherapie Full time

    Overview">We are seeking an experienced Large GPU Cluster Performance and Benchmark Engineer to join our advanced technology team as a Senior Principal. In this role, you will be responsible for designing, optimizing, and benchmarking large-scale GPU clusters, specifically focusing on running MLPerf benchmarks from MLCommons across thousands of NVIDIA and...


  • Seattle, Washington, United States Cognizant Full time

    Cognizant is seeking an experienced Performance Test Lead to join our team of IT professionals in a permanent role based in Sea Tac, WA.We are the largest Quality Assurance Practice Globally servicing 800+ Clients. Our industry-leading vision and expertise help with Quality Engineering transformation journeys for reputed clients.The ideal candidate will have...


  • Seattle, United States Ang Signal LLC Full time

    Job DescriptionJob DescriptionJob descriptionWe are looking for a Senior Network Engineer to join our team supporting our federal government client. In this role, the candidate will be responsible for supporting the federal agency's network infrastructure and services, the day-to-day operations, as well as providing technical leadership on a variety of...


  • Seattle, Washington, United States Top Secret Clearance Jobs Full time

    About the RoleOverviewAt Top Secret Clearance Jobs, we help individuals with exclusive security clearance find their next career opportunity and get interviews within 48 hours. We are dedicated to providing a platform for professionals to showcase their skills and experience.Job DescriptionWe are searching for a highly skilled Sr. Network Development...


  • Seattle, Washington, United States Amazon Services LLC Full time

    At Amazon Services LLC, we're redefining the retail experience with innovative solutions that empower merchants to thrive. As a High-Performance Software Engineer for E-commerce Solutions, you'll be part of a dynamic team responsible for designing and developing cutting-edge technology that drives business growth.We're looking for talented engineers who can...


  • Seattle, Washington, United States Arista Networks Full time

    Job OverviewArista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. Our relentless pursuit of innovation enables us to leverage the latest advancements in cloud computing, artificial intelligence, and software-defined networking.We value diversity of thought and perspectives,...


  • Seattle, United States Amazon Web Services, Inc. Full time

    "By applying to this position, your application will be considered for all locations we hire for in the United States.Do you want to be part of a team that deploys and automates one of the world’s largest and complex networks? As a Network Engineer working within AWS Infrastructure you will be responsible for deploying network infrastructure across all of...