xPU Specialist

2 months ago


Palo Alto, United States Dice Full time

Our client 8bit.AI is a dynamic startup in the Bay Area, CA seeking to hire Full-time employees and focused on developing a high-performance, multi-technology, vendor-independent, xPU-based Accelerated Cloud Computing platform. We stack massive clusters purpose-built for high-performance parallel computing and aim to launch a global accelerated cloud solution. Additionally, the firm will focus on broader Artificial General Intelligence (AGI) products, supercomputing services, and end-to-end AI engineering services. RESPONSIBILITIES Design and implement innovative hardware solutions for highly scalable and efficient xPU PODs. Collaborate with architects, software engineers, and system engineers to ensure optimal integration of hardware and software components within the PODs Deeply understand leading xPU architectures from NVIDIA, AMD, and/or Intel and leverage their capabilities for performance optimization within PODs Conduct performance and power analysis to identify and implement strategies for optimizing resource utilization and power consumption within the PODs Participate in the development and execution of hardware verification and validation plans Stay up to date on the latest advancements in xPU technology and related hardware trends Contribute to technical documentation and maintain clear communication within the team QUALIFICATIONS: Master s degree in Electrical Engineering, Computer Engineering, or a related field (or equivalent experience). Minimum 8+ years of experience in designing and developing hardware solutions, preferably for data center or high-performance computing environments Proven experience with virtualization platforms, preferably including VMware (vSphere, ESXi, etc.) OR Nutanix (AHV, AOS, etc.) Strong understanding of hypervisor technologies and their functionalities Ability to integrate and manage both internal and external virtualization platforms Must have experience developing and running applications using the ROCm platform with strong understanding of ROCm components like HIP, OpenCL, and AMD GPU architecture In-depth knowledge of xPU architectures, particularly from NVIDIA, AMD, or Intel. Completed certifications in NVIDIA AI in Datacenter and InfiniBand OR C-DAC certification. Strong understanding of computer architecture, memory systems, and interfacing techniques. Solid understanding of OpenStack concepts and experience managing cloud infrastructure Prior experience in building and operating Cloud POD infrastructure Proficiency in hardware description languages (HDL) like Verilog or VHDL Experience with hardware simulation and verification tools Excellent communication, collaboration, and problem-solving skills A passion for innovation and a drive to contribute to cutting-edge technology development GOOD TO HAVE: Experience with hardware design for data center infrastructure or high-performance computing systems Experience with thermal and power management solutions for high- density computing environments Familiarity with xPU programming frameworks like CUDA or OpenCL. Please send your resumes to srini at zaspar dot com

#J-18808-Ljbffr