High-Performance Inference Engineer
3 weeks ago
As we prepare to deploy our models across various device types, including GPUs, CPUs, and NPUs, we're seeking an expert who can optimize inference stacks tailored to each platform. We're looking for someone with exceptional technical skills and a deep understanding of GPU, CPU, and NPU architectures.
The ideal candidate is a highly skilled engineer with extensive experience in CUDA, C++, and Triton. Proficiency in building and enhancing inference stacks using frameworks like ggml, vllm, and DeepSpeed is essential. Additionally, experience with mobile development and expertise in cache-aware algorithms will be highly valued.
We estimate the salary range for this role to be between $180,000 and $250,000 per year, depending on location and qualifications.
Key Responsibilities- Strong ML Experience: Proficiency in Python and PyTorch to effectively interface with the ML team at a deeply technical level.
- Hardware Awareness: Must understand modern hardware architecture, including cache hierarchies and memory access patterns, and their impact on performance.
- Proficient in Coding: Expertise in Python, PyTorch, and either CUDA, Triton, or C++ is essential for this role.
- Optimization of Low-Level Primitives: Responsible for optimizing core primitives to ensure efficient model execution.
- Self-Guided and Ownership: Ability to independently take a PyTorch model and inference requirements (e.g., maximize GPU throughput or minimize CPU latency) and deliver a fully optimized stack with minimal guidance.
- Research-Driven: Should stay up-to-date with advancements in ML inference, such as new quantization techniques or speculative decoding, while maintaining focus on delivering practical solutions.
-
Inference Performance Specialist
3 weeks ago
San Francisco, California, United States Liquid AI Full timeJob DescriptionWe are looking for a talented Senior Optimization Engineer to join our team and help us develop highly optimized ML inference stacks for various hardware platforms. The successful candidate will have extensive experience in coding, with expertise in Python, PyTorch, CUDA, and C++. They should be able to work independently, taking ownership of...
-
High-Performance AI Model Engineer
15 hours ago
San Francisco, California, United States Perplexity AI Full timeWe're revolutionizing information access and knowledge synthesis with our cutting-edge question-answering and information retrieval systems.As an experienced AI Inference Engineer, you'll join our team to work on the internal workings of our AI inference stack, running neural networks that power our systems. Collaborate closely with AI Model Engineers and...
-
San Francisco, California, United States Acceler8 Talent Full timeUnlock the Full Potential of AI ModelsWe're driving innovation in on-device AI by optimizing foundation models for superior performance. As part of our Inference Performance Team, you'll push the limits of what's possible with inference efficiency.Your Key Responsibilities:Identify and address performance bottlenecks in reference implementations and our...
-
San Francisco, California, United States ZipRecruiter Full timeCompany OverviewOur client is a pioneering company in the field of artificial intelligence, dedicated to creating innovative digital solutions. They are at the forefront of developing high-performance pure digital AI inference chips and seek a skilled Software Architect to lead their software efforts.As a key member of the team, you will be responsible for...
-
Inference Engine Specialist
3 days ago
San Francisco, California, United States Predibase Full timeJob DescriptionWe're looking for an experienced software engineer to join our ML Inference team. As an engineer on this team, you will work on integrating new LLM inference techniques from research to improve latency and throughput of LLM serving systems.
-
Expert Inference System Architect
2 days ago
San Francisco, California, United States Tbwa ChiatDay Inc Full timeAt Together AI, we are seeking an experienced Inference System Architect to join our team. As a key member of our Inference Engine team, you will play a crucial role in designing and building high-performance systems that power our AI inference engine.About the RoleThis is a unique opportunity to collaborate with leading AI researchers and engineers to...
-
Senior Inference Systems Engineer
3 weeks ago
San Francisco, California, United States Genmo Full timeWe are seeking a highly skilled Senior Inference Systems Engineer to join our team at Genmo, a research lab dedicated to building open, state-of-the-art models for video generation.Job SummaryThe successful candidate will be responsible for designing and scaling our inference systems as they grow to support over millions of users across more than 20...
-
High-Performance Computing Specialist
3 weeks ago
San Francisco, California, United States OpenAI Full timeJob OverviewAs an inference infrastructure engineer at OpenAI, you will play a critical role in scaling our critical inference infrastructure to meet the growing demands of our customers. This includes efficiently servicing every customer request to use our state-of-the-art AI models, including GPT-4 and Dall-E. In this role, you will collaborate with...
-
Machine Learning Engineer
3 weeks ago
San Francisco, California, United States Together AI Full timeAbout the Role">We are looking for a talented Machine Learning Engineer to join our team at Together AI. As an MLOps engineer, you will develop systems and APIs that enable our customers to perform inference and fine-tune LLMs.">Responsibilities">Develop and deploy systems and APIs that enable customers to perform inference and fine-tune LLMs.Work closely...
-
Senior Inference Architect
3 weeks ago
San Francisco, California, United States OpenAI Full timeWe are seeking a Senior Inference Architect to join our team at OpenAI. As a key member of our infrastructure engineering team, you will be responsible for scaling up our critical inference infrastructure, which efficiently services every customer request to use our state-of-the-art AI models.In this role, you will work alongside machine learning...
-
Data Inference Specialist
2 months ago
San Francisco, California, United States Perplexity AI Full timeWe are seeking an experienced Data Inference Specialist to join our team at Perplexity AI.OverviewAt Perplexity AI, we've achieved tremendous growth and adoption since launching the world's first fully functional conversational answer engine. Our AI-powered search assistant has amassed 10 million monthly active users, with mobile apps installed over 1...
-
Large Scale Inference Specialist
3 weeks ago
San Francisco, California, United States Anyscale Full timeRole Overview:This is a critical role at Anyscale as it allows us to provide market-leading performance and price point for AI infrastructure. As a Large Scale Inference Specialist, you will help push the boundaries of performance for inference at large scale. This involves iterating quickly with product teams to ship end-to-end solutions for Batch and...
-
Machine Learning Engineer
4 weeks ago
San Francisco, California, United States Perplexity AI Full timeJob DescriptionWe are seeking an AI Inference Engineer to join our growing team. As a key member of our engineering team, you will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.Benchmark and address bottlenecks throughout our inference stackImprove the reliability and observability of our systems...
-
San Francisco, California, United States Genmo Full timeRole OverviewWe are seeking a seasoned software engineer to join our inference team at Genmo, a pioneering research lab focused on developing cutting-edge models for video generation. This role presents an exciting opportunity to shape the future of AI and push the boundaries of what's possible in video generation.In this position, you will be responsible...
-
Inference Stack Architect
3 weeks ago
San Francisco, California, United States Liquid AI Full timeHarness Machine Learning Potential: As a key member of our team, you'll play a vital role in shaping the future of machine learning at Liquid AI. With a competitive salary range of $150,000 - $170,000 per annum, depending on experience and qualifications, you'll have the opportunity to grow professionally and make a meaningful impact. Job Description: Our...
-
High-Performance Systems Engineer
3 weeks ago
San Jose, California, United States Recogni Full timeAbout the OpportunityWe are seeking a highly experienced Principal Software Engineer to join our world-class engineering team at Recogni. This position requires a strong technical background in software engineering and a passion for developing innovative solutions.This hands-on, technology leadership role involves multi-disciplinary end-to-end system...
-
San Francisco, California, United States Genmo Full timeAbout GenmoWe are a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of Artificial General Intelligence. Our mission is to shape the future of AI and push the boundaries of what's possible in video generation.Job SummaryWe're seeking an experienced Backend Engineer with a strong...
-
High-Performance AI Infrastructure Engineer
21 hours ago
San Francisco, California, United States Crusoe Full timeAbout CrusoeCrusoe is pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Our mission is to align the future of computing with the future of the climate. Recognized as the "gold standard" for reliability and performance, our AI platform is optimized for AI...
-
High Performance Data Engineer
13 hours ago
San Francisco, California, United States Magic AI Full timeMagic AI is at the forefront of building safe Artificial General Intelligence (AGI) that accelerates humanity's progress on the world's most pressing problems. Our approach combines frontier-scale pre-training, domain-specific Reinforcement Learning (RL), ultra-long context, and inference-time compute to achieve this goal.About the Role:As a High Performance...
-
High-Performance AI Engineering Specialist
3 weeks ago
San Francisco, California, United States Databricks Inc. Full timeFounded in 2020 by a team of innovative machine learning researchers, Mosaic AI empowers businesses to create cutting-edge AI models from scratch using their own data. With a strong commitment to the value of AI models as core intellectual property, Mosaic AI strives to make high-quality AI models accessible to all.As part of Databricks since 2023, our GenAI...