HPC Cluster Engineer

6 days ago


Milpitas, California, United States 1000 KLA Corporation Full time
About the Role

We are seeking a highly skilled HPC Performance Engineer to join our team at 1000 KLA Corporation. As a key member of our HPC team, you will be responsible for designing, implementing, and supporting high-performance compute clusters.

Key Responsibilities
  • Design and Implementation: Design and implement high-performance compute clusters, ensuring optimal performance and efficiency.
  • System Knowledge: Possess in-depth knowledge of HPC systems, including CPU/GPU architecture, scalable/robust storage, high-bandwidth inter-connects, and cloud-based computing architectures.
  • Performance Optimization: Identify and resolve processing efficiencies, drive optimizations to improve cluster utilization, and evaluate the detailed timing of cluster operation.
  • Linux Configuration: Use strong skills with the Linux OS to configure operating systems for the HPC system.
  • Project Management: Understand and assemble project specifications and performance requirements, adhere to project timelines, and ensure program achievements complete on time.
  • Product Support: Support the design and release of new products to manufacturing and the customer, providing quality golden images, procedures, scripts, and documentation.
Qualifications and Education

We are looking for individuals who are inquisitive, thrive on challenge, enjoy problem-solving, and have excellent written and verbal skills.

Required Qualifications
  • Linux Knowledge: Validated in-depth and flavor-agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu).
  • Parallel Programming: In-depth knowledge of parallel programming, vector-based processing, distributed computing, code optimization on CPU and GPUs.
  • Vector Processing: Experience in vector processing and multi-threading related technologies and libraries (SIMD, AVX, IPP, MKL, openCV, openMP, OpenCL, MPI, TBB, CUDA).
  • Performance Profiling: Knowledge of performance profilers (Intel vTune, Nvidia Nsight compute, AMD uProf, perf) and custom profiling and telemetry tools.
  • HPC Job Schedulers: Knowledge of HPC job schedulers and how they function.
  • Bottleneck Identification: Ability to find bottlenecks and drive closure of them, whether in data movement, code execution timing, or job scheduling optimization.
  • HPC Hardware Knowledge: Strong HPC HW knowledge, especially in server, GPU, networking, storage, BIOS, and BMC arenas.
  • Scripting Skills: Ability to code and develop Shell and Python scripts for developing test environments.
Preferred Qualifications
  • Kubernetes Experience: Experience with Kubernetes, Harbor, Prometheus, and Grafana.
  • Education and Experience: BS or MS degree + 3 to 5 years validated experience in Computer Engineering or Electrical Engineering related fields.
Skills and Abilities
  • Team Orientation: Highly motivated teammate with ability to develop and maintain collaborative relationships.
  • Organization and Time Management: Able to plan, schedule, organize, and follow up on tasks to achieve goals within or ahead of established time frames.
  • Multi-tasking: Ability to expeditiously organize, coordinate, manage, prioritize, and perform multiple tasks simultaneously.
  • Adaptability to Change: Able to be flexible and supportive, and able to assimilate change positively and proactively in a rapid growth environment.
  • Excellent Communication Skills: Outstanding teammate with excellent written and verbal communications skills.
Minimum Qualifications

Typically requires a Doctorate (Academic) Degree and 0 years related work experience; Master's Level Degree and related work experience of 3 years; Bachelor's Level Degree and related work experience of 5 years.

We offer a total rewards package that is competitive and comprehensive, including medical, dental, vision, life, and other voluntary benefits, 401(K) including company matching, employee stock purchase program (ESPP), student debt assistance, tuition reimbursement program, development and career growth opportunities and programs, financial planning benefits, wellness benefits including an employee assistance program (EAP), paid time off and paid company holidays, and family care and bonding leave.

KLA is proud to be an Equal Opportunity Employer. We do not discriminate on the basis of race, religion, color, national origin, sex, gender identity, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other status protected by applicable law. We will ensure that qualified individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment.


  • HPC Student Intern

    2 days ago


    Milpitas, California, United States KLA Full time

    Job SummaryWe are seeking a highly motivated and detail-oriented HPC Student Intern to join our team at KLA. As a member of our Global Products Group, you will have the opportunity to work on cutting-edge projects and develop your skills in data pipeline development, scripting, and visualization.Key ResponsibilitiesDevelop and maintain data pipelines for IMC...


  • Milpitas, California, United States Talent Groups Full time

    Position: Advanced Compute Systems EngineerLocation: OnsiteEmployment Type: ContractPosition Overview:Engaging in the development and management of high-performance computing clusters, which includes the assembly, installation, upkeep, enhancement, documentation, and procedural writing for server hardware and software systems utilized in organizational...


  • Milpitas, California, United States Tarana Wireless Full time

    Senior Site Reliability Engineer at Tarana WirelessAs a **Senior Site Reliability Engineer** at **Tarana Wireless**, you will play a crucial role in overseeing software operations in the cloud and managing a vast network of remote radio devices. Collaborating closely with your team, you will serve as a key contact during off-peak hours and supervise all...


  • Milpitas, California, United States Tarana Wireless Full time

    Senior Site Reliability Engineer at Tarana WirelessAs a **Senior Site Reliability Engineer** at **Tarana Wireless**, you will play a crucial role in overseeing software operations in the cloud and managing remote radio devices on a large scale. Collaborating with a dedicated team, you will serve as a key contact during off-peak hours and supervise all...


  • Milpitas, California, United States Tarana Wireless Full time

    Job DescriptionOverviewTarana Wireless is seeking a highly skilled Senior AI/MLOps Engineer to join our team. As a key member of our organization, you will be responsible for designing and implementing end-to-end AI/ML systems, ensuring high data quality and reliability in our data warehouse, and developing improved model workflows.Key ResponsibilitiesDesign...


  • Milpitas, California, United States Tarana Wireless Full time

    Job DescriptionOverviewTarana Wireless is seeking a highly skilled Senior AI/MLOps Engineer to join our team. As a key member of our organization, you will be responsible for designing and implementing end-to-end AI/ML systems, ensuring high data quality and reliability in our data warehouse, and developing improved model workflows.Key ResponsibilitiesDesign...


  • Milpitas, California, United States Tarana Wireless Full time

    Senior Site Reliability Engineer at Tarana WirelessAs a **Senior Site Reliability Engineer** at **Tarana Wireless**, you will play a crucial role in overseeing software operations in the cloud and managing millions of remote radio devices. Collaborating closely with your team, you will serve as the primary contact during off-peak hours and will be...

  • AI/MLOps Engineer

    6 days ago


    Milpitas, California, United States Tarana Wireless Inc Full time

    About the RoleThis position focuses on the end-to-end workflow for our AI/ML models and data. It will require our team members to wear different hats as we scale our deployments, our maturity and our organization.Key ResponsibilitiesData Engineering: Work with our cloud system data engineers and our data scientists to ensure high data quality and reliability...


  • Milpitas, California, United States KLA Full time

    Compensation Overview: $139,000 - $236,800 AnnuallyLocation: USA-CA-Milpitas-KLAKLA offers a comprehensive total rewards package that may include participation in performance incentive programs and eligibility for various benefits outlined below. Interns may also qualify for some of these benefits. Our compensation ranges are determined by role, level, and...


  • Milpitas, California, United States KLA Full time

    Compensation Overview: Base Pay Range: $124,000.00 - $211,000.00Location: USACompany Overview:KLA stands at the forefront of diversified electronics within the semiconductor manufacturing sector. Our technologies are integral to the production of virtually every electronic device in existence. From laptops to smartphones, and from wearables to smart cars,...


  • Milpitas, California, United States Western Digital Full time

    Job OverviewThe Storage Solutions Architect will be responsible for the design, architecture, and implementation of advanced storage systems and backup solutions across our global infrastructure. This role will focus on technologies such as Dell PowerMax, NetApp Cluster mode, and Nasuni Cloud Appliance, alongside Dell Isilon, Dell Unity, Dell ECS, and...


  • Milpitas, California, United States Western Digital Full time

    Job OverviewThe Storage Solutions Architect will be responsible for the design, architecture, and implementation of enterprise storage systems and backup solutions across our global data centers and corporate offices. This role will focus on technologies such as Dell PowerMax, NetApp Cluster mode, Nasuni Cloud Appliance, along with Dell Isilon, Dell Unity,...


  • Milpitas, California, United States Western Digital Full time

    Job OverviewCompany Overview:At Western Digital, we are dedicated to driving global innovation and expanding the horizons of technology, transforming the seemingly impossible into reality.As a company built on problem-solving, we empower individuals to achieve remarkable feats through the right technology. Our legacy includes pivotal contributions to...


  • Milpitas, California, United States KLA Full time

    Compensation Range: $124,000.00 - $211,000.00Location: USA-CA-Milpitas KLA offers a comprehensive total rewards package for its employees, which may include participation in performance incentive programs and eligibility for additional benefits outlined below. Interns may also qualify for certain benefits. The displayed compensation range represents the...


  • Milpitas, California, United States Western Digital Full time

    Job OverviewCompany Overview:At Western Digital, we are driven by a vision to fuel global innovation and redefine the limits of technology, transforming what was once deemed impossible into reality.As a company of innovators, we empower individuals to achieve remarkable feats through advanced technology. Our contributions have been pivotal in significant...


  • Milpitas, California, United States Western Digital Full time

    Job OverviewCompany Overview:At Western Digital, we are dedicated to driving global innovation and expanding the horizons of technology, transforming what was once deemed impossible into reality.As a company rooted in problem-solving, we empower individuals to achieve remarkable feats through advanced technology. Our innovations have played a pivotal role in...


  • Milpitas, California, United States Western Digital Full time

    Job OverviewCompany Overview:At Western Digital, we strive to drive global innovation and redefine technological boundaries, transforming the seemingly impossible into reality.We are fundamentally a company of innovators. With the right technology, remarkable achievements are within reach. Our legacy includes pivotal contributions, such as supporting...

  • HPC Engineer

    4 weeks ago


    Milpitas, United States E-Solutions INC Full time

    Job DescriptionJob DescriptionRole: HPC Engineer Location; Milpitas, CA We need a low level rack and stack person, that will also do server installations & cabling of the racks. Maybe some configuration of switches, PDUs, and some manual OS installs onto servers. If they have some python / bash experience that would be good. This is a very physical activity...

  • HPC Engineer

    4 weeks ago


    Milpitas, United States Intellectt Inc Full time

    Position:: HPC Engineer Location:: Milpitas,CA 95035 - Onsite Duration::Long TermMandatory Skills:: High Performance Compute clustersClient Note::We need a low level rack and stack person, that will also do server installations & cabling of the racks. Maybe some configuration of switches, PDUs, and some manual OS installs onto servers.If they have some...

  • HPC Engineer

    4 weeks ago


    Milpitas, United States Intellectt Inc Full time

    Position:: HPC Engineer Location:: Milpitas,CA 95035 - Onsite Duration::Long TermMandatory Skills:: High Performance Compute clustersClient Note::We need a low level rack and stack person, that will also do server installations & cabling of the racks. Maybe some configuration of switches, PDUs, and some manual OS installs onto servers.If they have some...


  • Milpitas, United States 1000 KLA Corporation Full time

    Description /Preferred Qualifications Responsibilities for this exciting role will include: Design, implementation & support of high-performance compute clusters Solid knowledge on HPC systems, including CPU/GPU architecture, scalable/robust storage, high-bandwidth inter-connects, and a knowledge of cloud-based computing architectures Ability to...

  • HPC Engineer

    4 weeks ago


    Milpitas, United States Talent Groups Full time

    Keywords : High Performance Compute clusters, Rack ConfigurationsJob Description:working on high performance compute clusters: specifically, constructing, installing, maintaining, upgrading, documenting, and writing procedures for server hardware and software systems used on company products. Projects involve hands on working with high performance Linux...

  • HPC Engineer

    4 weeks ago


    Milpitas, United States Talent Groups Full time

    Keywords : High Performance Compute clusters, Rack ConfigurationsJob Description:working on high performance compute clusters: specifically, constructing, installing, maintaining, upgrading, documenting, and writing procedures for server hardware and software systems used on company products. Projects involve hands on working with high performance Linux...

  • HPC Engineer

    1 month ago


    Milpitas, United States HCL USAAvance Consulting Full time

    working on high performance compute clusters: specifically, constructing, installing, maintaining, upgrading, documenting, and writing procedures for server hardware and software systems used on company products.  Projects involve hands on working with high performance Linux compute clusters which includes bios configuration, configuration and testing...


  • Milpitas, United States Tarana Wireless Full time

    Job DescriptionJob DescriptionAs a Senior Site Reliability Engineer, you will help us manage software that runs on the cloud and remotely manages millions of radio devices. You will work on a team and be a main point of contact during off shore hours and responsible for all aspects of cloud operations, such as:Infrastructure as CodeManage environments in...


  • Milpitas, United States Tarana Wireless Full time

    Job DescriptionJob DescriptionThis position focuses on the end-to-end workflow for our AI/ML models and data. It will require our team members to wear different hats as we scale our deployments, our maturity and our organization Data engineering - work with our cloud system data engineers and our data scientists to ensure high data quality and reliability...


  • milpitas, United States Venture Corporation Limited Full time

    Venture, a public listed company in SGX, is a leading global provider of technology services, products and solutions with established capabilities spanning marketing research, design, research and development. Over the years, Venture has built know-how and intellectual property with expertise in several technology domains. These include life science &...


  • milpitas, United States Venture Corporation Limited Full time

    Venture, a public listed company in SGX, is a leading global provider of technology services, products and solutions with established capabilities spanning marketing research, design, research and development. Over the years, Venture has built know-how and intellectual property with expertise in several technology domains. These include life science &...


  • Milpitas, United States Venture Corporation Limited Full time

    Venture, a public listed company in SGX, is a leading global provider of technology services, products and solutions with established capabilities spanning marketing research, design, research and development. Over the years, Venture has built know-how and intellectual property with expertise in several technology domains. These include life science &...


  • Milpitas, United States Venture Corporation Limited Full time

    Venture, a public listed company in SGX, is a leading global provider of technology services, products and solutions with established capabilities spanning marketing research, design, research and development. Over the years, Venture has built know-how and intellectual property with expertise in several technology domains. These include life science &...


  • Milpitas, United States Western Digital Full time

    Job DescriptionJob DescriptionCompany DescriptionAt Western Digital, our vision is to power global innovation and push the boundaries of technology to make what you thought was once impossible, possible.At our core, Western Digital is a company of problem solvers. People achieve extraordinary things given the right technology. For decades, we’ve been doing...