High-Performance Inference Engineer

3 weeks ago

San Francisco, California, United States Liquid AI Full time

About the Role

As we prepare to deploy our models across various device types, including GPUs, CPUs, and NPUs, we're seeking an expert who can optimize inference stacks tailored to each platform. We're looking for someone with exceptional technical skills and a deep understanding of GPU, CPU, and NPU architectures.

The ideal candidate is a highly skilled engineer with extensive experience in CUDA, C++, and Triton. Proficiency in building and enhancing inference stacks using frameworks like ggml, vllm, and DeepSpeed is essential. Additionally, experience with mobile development and expertise in cache-aware algorithms will be highly valued.

We estimate the salary range for this role to be between $180,000 and $250,000 per year, depending on location and qualifications.

Key Responsibilities

Strong ML Experience: Proficiency in Python and PyTorch to effectively interface with the ML team at a deeply technical level.
Hardware Awareness: Must understand modern hardware architecture, including cache hierarchies and memory access patterns, and their impact on performance.
Proficient in Coding: Expertise in Python, PyTorch, and either CUDA, Triton, or C++ is essential for this role.
Optimization of Low-Level Primitives: Responsible for optimizing core primitives to ensure efficient model execution.
Self-Guided and Ownership: Ability to independently take a PyTorch model and inference requirements (e.g., maximize GPU throughput or minimize CPU latency) and deliver a fully optimized stack with minimal guidance.
Research-Driven: Should stay up-to-date with advancements in ML inference, such as new quantization techniques or speculative decoding, while maintaining focus on delivering practical solutions.

Inference Performance Specialist

3 weeks ago

San Francisco, California, United States Liquid AI Full time

Job DescriptionWe are looking for a talented Senior Optimization Engineer to join our team and help us develop highly optimized ML inference stacks for various hardware platforms. The successful candidate will have extensive experience in coding, with expertise in Python, PyTorch, CUDA, and C++. They should be able to work independently, taking ownership of...
High-Performance AI Model Engineer

15 hours ago

San Francisco, California, United States Perplexity AI Full time

We're revolutionizing information access and knowledge synthesis with our cutting-edge question-answering and information retrieval systems.As an experienced AI Inference Engineer, you'll join our team to work on the internal workings of our AI inference stack, running neural networks that power our systems. Collaborate closely with AI Model Engineers and...
Inference Performance Optimization Specialist

4 days ago

San Francisco, California, United States Acceler8 Talent Full time

Unlock the Full Potential of AI ModelsWe're driving innovation in on-device AI by optimizing foundation models for superior performance. As part of our Inference Performance Team, you'll push the limits of what's possible with inference efficiency.Your Key Responsibilities:Identify and address performance bottlenecks in reference implementations and our...
Software Architect Leader for High-Performance AI Inference

6 days ago

San Francisco, California, United States ZipRecruiter Full time

Company OverviewOur client is a pioneering company in the field of artificial intelligence, dedicated to creating innovative digital solutions. They are at the forefront of developing high-performance pure digital AI inference chips and seek a skilled Software Architect to lead their software efforts.As a key member of the team, you will be responsible for...
Inference Engine Specialist

3 days ago

San Francisco, California, United States Predibase Full time

Job DescriptionWe're looking for an experienced software engineer to join our ML Inference team. As an engineer on this team, you will work on integrating new LLM inference techniques from research to improve latency and throughput of LLM serving systems.
Expert Inference System Architect

2 days ago

San Francisco, California, United States Tbwa ChiatDay Inc Full time

At Together AI, we are seeking an experienced Inference System Architect to join our team. As a key member of our Inference Engine team, you will play a crucial role in designing and building high-performance systems that power our AI inference engine.About the RoleThis is a unique opportunity to collaborate with leading AI researchers and engineers to...
Senior Inference Systems Engineer

3 weeks ago

San Francisco, California, United States Genmo Full time

We are seeking a highly skilled Senior Inference Systems Engineer to join our team at Genmo, a research lab dedicated to building open, state-of-the-art models for video generation.Job SummaryThe successful candidate will be responsible for designing and scaling our inference systems as they grow to support over millions of users across more than 20...
High-Performance Computing Specialist

3 weeks ago

San Francisco, California, United States OpenAI Full time

Job OverviewAs an inference infrastructure engineer at OpenAI, you will play a critical role in scaling our critical inference infrastructure to meet the growing demands of our customers. This includes efficiently servicing every customer request to use our state-of-the-art AI models, including GPT-4 and Dall-E. In this role, you will collaborate with...
Machine Learning Engineer

3 weeks ago

San Francisco, California, United States Together AI Full time

About the Role">We are looking for a talented Machine Learning Engineer to join our team at Together AI. As an MLOps engineer, you will develop systems and APIs that enable our customers to perform inference and fine-tune LLMs.">Responsibilities">Develop and deploy systems and APIs that enable customers to perform inference and fine-tune LLMs.Work closely...
Senior Inference Architect

3 weeks ago

San Francisco, California, United States OpenAI Full time

We are seeking a Senior Inference Architect to join our team at OpenAI. As a key member of our infrastructure engineering team, you will be responsible for scaling up our critical inference infrastructure, which efficiently services every customer request to use our state-of-the-art AI models.In this role, you will work alongside machine learning...
Data Inference Specialist

2 months ago

San Francisco, California, United States Perplexity AI Full time

We are seeking an experienced Data Inference Specialist to join our team at Perplexity AI.OverviewAt Perplexity AI, we've achieved tremendous growth and adoption since launching the world's first fully functional conversational answer engine. Our AI-powered search assistant has amassed 10 million monthly active users, with mobile apps installed over 1...
Large Scale Inference Specialist

3 weeks ago

San Francisco, California, United States Anyscale Full time

Role Overview:This is a critical role at Anyscale as it allows us to provide market-leading performance and price point for AI infrastructure. As a Large Scale Inference Specialist, you will help push the boundaries of performance for inference at large scale. This involves iterating quickly with product teams to ship end-to-end solutions for Batch and...
Machine Learning Engineer

4 weeks ago

San Francisco, California, United States Perplexity AI Full time

Job DescriptionWe are seeking an AI Inference Engineer to join our growing team. As a key member of our engineering team, you will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.Benchmark and address bottlenecks throughout our inference stackImprove the reliability and observability of our systems...
Senior Software Engineer for Scalable AI Inference Systems

3 weeks ago

San Francisco, California, United States Genmo Full time

Role OverviewWe are seeking a seasoned software engineer to join our inference team at Genmo, a pioneering research lab focused on developing cutting-edge models for video generation. This role presents an exciting opportunity to shape the future of AI and push the boundaries of what's possible in video generation.In this position, you will be responsible...
Inference Stack Architect

3 weeks ago

San Francisco, California, United States Liquid AI Full time

Harness Machine Learning Potential: As a key member of our team, you'll play a vital role in shaping the future of machine learning at Liquid AI. With a competitive salary range of $150,000 - $170,000 per annum, depending on experience and qualifications, you'll have the opportunity to grow professionally and make a meaningful impact. Job Description: Our...
High-Performance Systems Engineer

3 weeks ago

San Jose, California, United States Recogni Full time

About the OpportunityWe are seeking a highly experienced Principal Software Engineer to join our world-class engineering team at Recogni. This position requires a strong technical background in software engineering and a passion for developing innovative solutions.This hands-on, technology leadership role involves multi-disciplinary end-to-end system...
High-Performance Backend Engineer for AI-Powered Content Creation

1 day ago

San Francisco, California, United States Genmo Full time

About GenmoWe are a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of Artificial General Intelligence. Our mission is to shape the future of AI and push the boundaries of what's possible in video generation.Job SummaryWe're seeking an experienced Backend Engineer with a strong...
High-Performance AI Infrastructure Engineer

21 hours ago

San Francisco, California, United States Crusoe Full time

About CrusoeCrusoe is pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Our mission is to align the future of computing with the future of the climate. Recognized as the "gold standard" for reliability and performance, our AI platform is optimized for AI...
High Performance Data Engineer

13 hours ago

San Francisco, California, United States Magic AI Full time

Magic AI is at the forefront of building safe Artificial General Intelligence (AGI) that accelerates humanity's progress on the world's most pressing problems. Our approach combines frontier-scale pre-training, domain-specific Reinforcement Learning (RL), ultra-long context, and inference-time compute to achieve this goal.About the Role:As a High Performance...
High-Performance AI Engineering Specialist

3 weeks ago

San Francisco, California, United States Databricks Inc. Full time

Founded in 2020 by a team of innovative machine learning researchers, Mosaic AI empowers businesses to create cutting-edge AI models from scratch using their own data. With a strong commitment to the value of AI models as core intellectual property, Mosaic AI strives to make high-quality AI models accessible to all.As part of Databricks since 2023, our GenAI...

Americas

Europe

Asia / Oceania

Africa

High-Performance Inference Engineer