Current jobs related to IT InfiniBand/GPU - San Jose - Cadence Design Systems


  • San Jose, United States Oho Group Ltd Full time

    We are seeking an experienced Sr. Staff Systems Engineer for the deployment of HPC infrastructure.Please only apply if you have 5+ years of InfiniBandThis role focuses on supporting technical aspects related to HPC, InfiniBand, and GPU technologiesKey Responsibilities:Lead customer deployments, ensuring the on-time bring-up of GPU servers, InfiniBand fabric,...


  • San Jose, United States Oho Group Ltd Full time

    We are seeking an experienced Sr. Staff Systems Engineer for the deployment of HPC infrastructure.Please only apply if you have 5+ years of InfiniBandThis role focuses on supporting technical aspects related to HPC, InfiniBand, and GPU technologiesKey Responsibilities:Lead customer deployments, ensuring the on-time bring-up of GPU servers, InfiniBand fabric,...


  • San Jose, California, United States Oho Group Ltd Full time

    Job Summary:We are seeking an experienced Senior Staff Systems Engineer to lead the deployment of High-Performance Computing (HPC) infrastructure for the Oho Group Ltd. This role requires a strong background in InfiniBand and GPU technologies.Key Responsibilities:Lead Customer Deployments: Ensure the timely and successful deployment of GPU servers,...


  • San Jose, California, United States Cadence Design Systems Full time

    At Cadence Design Systems, we are seeking a skilled Senior Staff Systems Engineer to enhance our team. This position is ideal for an experienced individual with a robust background in systems engineering and administration.Position:Senior Staff Systems EngineerLocation: Not specifiedKey Responsibilities:Facilitate customer implementations and guarantee...


  • San Jose, California, United States Cadence Design Systems Full time

    About the RoleCadence Design Systems is seeking a highly skilled Senior Staff Systems Engineer to join our team. As a key member of our infrastructure team, you will be responsible for accelerating strategic customer deployments and ensuring on-time bring-up and deployment of HPC infrastructure.Key ResponsibilitiesAccelerate strategic customer deployments...


  • San Francisco, California, United States OpenAI Full time

    About the TeamThe Applied Engineering team at OpenAI is a collaborative group that works across research, engineering, product, and design to bring the company's technology to consumers and businesses. This team is responsible for running the infrastructure that supports the models backing ChatGPT and the API, including inference kubernetes clusters, GPU...

  • HPC Engineer

    2 months ago


    San Fernando, United States Northrop Grumman Full time

    Northrop Grumman Classified Solution is seeking a Staff HPC Engineer to join our dynamic team of technical professionals in the Northridge, California. Must have: Current DoD Secret clearance- adjudicated within the past 5 years.Basic qualifications for a Staff HPC Engineer level (T05): Associate’s degree with 16 years of experience, or a bachelor’s...

  • HPC Engineer

    2 months ago


    San Fernando, United States Northrop Grumman Full time

    Northrop Grumman Classified Solution is seeking a Staff HPC Engineer to join our dynamic team of technical professionals in the Northridge, California. Must have: Current DoD Secret clearance- adjudicated within the past 5 years.Basic qualifications for a Staff HPC Engineer level (T05): Associate’s degree with 16 years of experience, or a bachelor’s...


  • San Francisco, California, United States Crusoe Energy Inc Full time

    About Crusoe Energy Inc.Crusoe Energy Inc. is a pioneering company dedicated to harnessing the power of computation to unlock value in stranded energy resources. Our mission is to align the long-term interests of the climate with the future of global computing infrastructure.Our VisionWe aim to revolutionize the way data centers consume power by co-locating...


  • San Francisco, California, United States Crusoe Energy Inc Full time

    About Crusoe Energy Inc.Crusoe Energy Inc. is a pioneering company dedicated to harnessing the power of computation to unlock value in stranded energy resources. Our mission is to align the long-term interests of the climate with the future of global computing infrastructure.Our VisionWe aim to ensure that the energy meeting the demand for data centers is...


  • San Diego, United States Canonical - Jobs Full time

    Job DescriptionJob DescriptionYou will work across the full Linux stack from kernel through networking, virtualization and graphics to optimise Ubuntu, the world's most widely used Linux desktop and server, for the latest silicon. Our teams partner with specialist engineers from major silicon companies to integrate next-generation features and...

IT InfiniBand/GPU

3 months ago


San Jose, United States Cadence Design Systems Full time
At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology.

Cadence is looking for a Sr Staff Systems Engineer who accelerates strategic customer deployments and ensures on-time bring-up and deployment of HPC infrastructure and troubleshooting and supports technical roles supporting HPC, InfiniBand, and GPU at our San Jose location

The successful candidate will be a hands-on technical candidate within the infrastructure team and be exposed to customer interfaces dealing with the Windows and Linux OS.

The System Engineer will need experience in Linux environments and proficiency in tasks such as shell scripting.

Role: IT -Sr Staff Systems Engineer

Location on-site (not remote): San Jose, CA

Must Haves
  • 15+ years of experience in system administration and engineering.
  • Minimum five years overall experience in technical roles supporting GPU Infrastructure setup using InfiniBand
  • Experience with interconnections between InfiniBand & GPU's
  • Experience with GPU Enabled MPI's
  • Experience with GPU Nvidia CUDA or AMD's ROCm
  • Experience with; H100, AMD MI210, GPU servers in Cluster
  • Customer deployments and ensure on-time bring-up of GPU Servers. InfiniBand fabric bring-up, configuration, and subnet management on the IB switch
  • Participate in engagements with various SW and FW (BMC/SBIOS/OS/drivers etc.) teams to develop best-in-class practices and tools; you will be analyzing, debugging, and resolving critical firmware and software issues for the workload performance at scale
  • Provide engineering solutions to enable large-scale performance strategies for performance for Datacenter GPU Computing products and software stacks, ensure technical relationships with internal and external engineering teams, and assist systems engineers in building creative solutions
  • Strong knowledge of Linux operating systems and networking and security concepts.
  • Document and drive acceptance and qualification test plans, procedures, and reports
Requirements
  • Accelerate strategic customer deployments and ensure on-time bring-up and deployment of HPC infrastructure
  • Participate in engagements with various SW and FW (BMC/SBIOS/OS/drivers etc.) teams to develop best-in-class practices and tools; you will be analyzing, debugging, and resolving critical firmware and software issues for the workload performance at scale
  • Provide engineering solutions to enable large-scale performance strategies for performance for Datacenter GPU Computing products and software stacks, ensure technical relationships with internal and external engineering teams, and assist systems engineers in building creative solutions
  • Development and implementation of server and rack-level telemetry aspects, collaborate and establish continuous improvements in our design flows
  • Recent experience in critical data center technologies such as server architectures, software containers, job schedulers, and parallel computing. Deployment and operation of large-scale systems; resilient system design; and clustering of computing resources
  • cluster management for HPC and actively connect with management regarding any problems with the equipment and propose a resolution
  • Establish and maintain IT infrastructure and procedures for customer-facing and internal systems
  • Actively establish the technical relationship with our customer's engineers, management, and architects at focus accounts
  • Create and develop test plans for new features on each product. Recommend improvements to enable automated scripting for testing and archiving of results. Develop HPC computing strategies for cloud-based computing, GPU-accelerated computing, etc.
  • Provide remote cluster support to large environments, including scalability/flexibility and troubleshooting end-user issues involving job submission, runtime, and resource access.
  • InfiniBand fabric configuration and administration on Red hat/Centos/Linux experience in configuring PKeys and troubleshooting the end-to-end InfiniBand environment
  • InfiniBand fabric bring-up, configuration, subnet management, and monitoring on the IB switch and client side for multi-tenancy setup, understanding of IPoIB communication modes
  • Performance comparison of the InfiniBand network with cluster interconnects and debugging the InfiniBand performance-related issues
  • Automate configuration management, software updates, and system availability maintenance and monitoring using modern DevOps tools (Ansible, Gitlab, etc.)
  • Be a technical specialist on GPU computing and networking products, directly supporting GPU customers
  • Direct experience and strong knowledge of parallel programming, GPU CUDA/ROCm development, and applications.
  • Actively partner with the R&D teams delivering services to our infrastructure to gather their service requirements to live within this infrastructure.
  • Automate repetitive tasks and implement custom solutions using scripting/programming languages such as bash or python
  • Configure and troubleshoot a heterogeneous (QDR, FDR, EDR) InfiniBand network and associated subnet manager
  • Experience with High-performance computer interconnects (e.g. 10 and 40 Gigabit Ethernet, InfiniBand)
  • Able to move 50+ pounds


#LI-MA1

The annual salary range for California is $133,000 to $247,000. You may also be eligible to receive incentive compensation: bonus, equity, and benefits. Sales positions generally offer a competitive On Target Earnings (OTE) incentive compensation structure. Please note that the salary range is a guideline and compensation may vary based on factors such as qualifications, skill level, competencies and work location. Our benefits programs include: paid vacation and paid holidays, 401(k) plan with employer match, employee stock purchase plan, a variety of medical, dental and vision plan options, and more.
We're doing work that matters. Help us solve what others can't.