AI Inference Engineer

12 hours ago


Burlingame, United States quadric, Inc Full time

Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.Role:The AI Inference Engineer in Quadric is the key bridge between the world of AI/LLM models and Quadric unique platforms. The AI Inference Engineer at Quadric will [1] port AI models to Quadric platform; [2] optimize the model deployment for efficient inference; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI model algorithms, system architecture and AI toolchains/frameworks.Responsibilities:Quantize, prune and convert models for deploymentPort models to Quadric platform using Quadric toolchainOptimize inference deployment for latency, speedBenchmark and profile model performance and accuracyDevelop tools to scale and speed up the deploymentMake Improvement to SDK and runtimeProvide technical support and documents to customers and developer communityRequirementsRequirements:Bachelor’s or Master’s in Computer Science and/or Electric Engineering.5+ years of experience in AI/LLM model inference and deployment frameworks/toolsexperience with model quantization (PTQ, QAT) and toolsexperience with model accuracy measuresexperience with model inference performance profilingexperience with at least one of the following frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, llamacppProficiency in C/C++ and PythonDemonstrate good capability in problem solving, debug and communicationBenefitsHealth Care Plan (Medical, Dental & Vision)Retirement Plan (401k, IRA)Life Insurance (Basic, Voluntary & AD&D)Paid Time Off (Vacation, Sick & Public Holidays)Family Leave (Maternity, Paternity)Short Term & Long Term DisabilityTraining & DevelopmentWork From HomeFree Food & SnacksStock Option Plan


  • AI Inference Engineer

    3 weeks ago


    Burlingame, United States quadric.io Full time

    Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems....

  • AI Inference Engineer

    2 weeks ago


    Burlingame, United States quadric.io, Inc Full time

    Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems....


  • Burlingame, CA, United States quadric.io Full time

    Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems....


  • Burlingame, CA, United States quadric.io Full time

    Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems....

  • AI Inference Engineer

    2 weeks ago


    Burlingame, CA, United States quadric.io Full time

    Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems....


  • Burlingame, United States quadric.io, Inc Full time

    A pioneering tech company is looking for an experienced AI Inference Engineer to bridge AI models and advanced processing platforms. This role requires expertise in AI model algorithms, strong C/C++ and Python skills, and experience with deployment frameworks. You will optimize and benchmark AI models, ensuring efficient deployment in edge devices. The...


  • Burlingame, United States Quadric Inc. Full time

    A leading technology company in California is seeking an AI Inference Engineer to bridge AI models with unique platforms. Key responsibilities include model optimization, deployment, and performance profiling. Candidates should have a Bachelor’s or Master’s degree, 5+ years' experience in AI frameworks, and proficiency in C/C++ and Python. Competitive...

  • AI Kernel Engineer

    2 weeks ago


    Burlingame, CA, United States quadric.io Full time

    Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems....

  • AI Kernel Engineer

    4 days ago


    Burlingame, CA, United States quadric.io Full time

    Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems....


  • Burlingame, United States Genesis Molecular AI Full time

    About the Team We’re a tight-knit team of proven drug hunters, deep learning researchers, and software engineers united by a common mission — drive AI innovation in biochemistry, discovering and developing groundbreaking therapies for patients suffering from severe disorders. Genesis AI team is focused on developing foundation models for small molecule...