AI Inference Systems Engineer

3 weeks ago

San Francisco, California, United States Perplexity AI Full time

We are seeking an experienced AI Inference Systems Engineer to join our growing team at Perplexity AI. Our current stack includes Python, C++, TensorRT-LLM, and Kubernetes, providing a unique opportunity to work on large-scale deployment of machine learning models for real-time inference.

Key Responsibilities:

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

Requirements:

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Experience with deploying reliable, distributed, real-time model serving at scale
(Optional) Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Compensation:

The cash compensation range for this role is $190,000 - $240,000. In addition to the base salary, equity is part of the total compensation package. Comprehensive health, dental, and vision insurance for you and your dependents, including a 401(k) plan.

Senior Inference Optimization Engineer

4 weeks ago

San Francisco, California, United States Liquid AI Full time

About the RoleWe're seeking a highly skilled engineer to join our team at Liquid AI, where you'll play a critical role in optimizing inference stacks for our AI models.As a key member of our team, you'll be responsible for taking our models and delivering highly optimized inference stacks that leverage existing frameworks like ggml, vllm, and DeepSpeed to...
Senior Inference Systems Engineer

11 hours ago

San Francisco, California, United States Genmo Inc. Full time

At Genmo Inc., we are a research lab dedicated to building state-of-the-art models for video generation. Our goal is to unlock the potential of Artificial General Intelligence (AGI).Job OverviewWe are seeking a senior/staff software engineer to join our inference team. This role involves designing and scaling our inference systems to support millions of...
AI Inference Deployment Specialist

2 days ago

San Francisco, California, United States Tbwa ChiatDay Inc Full time

We are seeking an experienced AI Inference Deployment Specialist to join our team at Skild AI. As a key member of our robotics team, you will be responsible for deploying cutting-edge AI models and optimizing their performance in real-world environments.Role OverviewIn this role, you will work closely with our cross-functional team to design and develop...
Senior Inference Optimization Engineer

3 weeks ago

San Francisco, California, United States Liquid AI Full time

At Liquid AI, we're seeking a highly skilled engineer to optimize inference stacks tailored to various hardware platforms.The ideal candidate has extensive experience in CUDA, C++, and Triton, as well as a deep understanding of GPU, CPU, and NPU architectures.They should be self-motivated, capable of working independently, and driven by a passion for...
Software Engineer, Inference

4 weeks ago

San Francisco, California, United States Anthropic Full time

About AnthropicAnthropic is a public benefit corporation headquartered in San Francisco, dedicated to creating reliable, interpretable, and steerable AI systems. Our mission is to develop AI that is safe and beneficial for users and society as a whole.Job DescriptionWe are seeking a skilled Software Engineer, Inference to join our Inference team. As a key...
Platform ML Engineering Manager, Inference

4 weeks ago

San Francisco, California, United States Openai Full time

About the TeamThe Platform ML team is responsible for building the ML side of our internal training framework, which is used to train cutting-edge models.We work on distributed model execution, as well as the interfaces and implementation for model code, training, and inference.Our priorities are to maximize training throughput and researcher throughput,...
AI Infrastructure Architect

4 weeks ago

San Francisco, California, United States Together AI Full time

AI Infrastructure Expertise:Design and implement high-performance AI/ML infrastructure, ensuring scalability, availability, and efficient resource utilization.Automation and Optimization:Develop and deploy automation tools, monitoring solutions, and operational strategies to streamline infrastructure management and reduce manual tasks.Collaboration and...
AI Systems Engineer

3 weeks ago

San Francisco, California, United States LOG10 LLC Full time

Log10 LLC is addressing the challenges around reliability and consistency of LLM-powered applications via a platform that provides AI-powered evaluations, fine-tuning, and debugging tools.We are a team of experienced professionals having previously worked in AI and infra roles at companies such as Intel, MosaicML, Adobe, Docker, PostEra, Starburst, and...
Machine Learning Operations Engineer

3 weeks ago

San Francisco, California, United States Together AI Full time

Together AI is seeking an experienced MLOps engineer to develop and deploy scalable AI/ML systems. The ideal candidate will have a strong understanding of machine learning, particularly large language models, and experience with DevOps practices like CI/CD, automation, and containerization.Key ResponsibilitiesDesign and implement runtime systems for...
AI Systems Engineer

3 weeks ago

San Francisco, California, United States Distyl AI Full time

At Distyl AI, we're pushing the boundaries of AI innovation to power core operational workflows for the Fortune 500. We're seeking an experienced Frontend Engineer to join our team and help define the future of work with a focus on human value.You will build the UI/UX patterns in which AI is deployed and used by the world's most important institutions....
AI Systems Engineer

3 weeks ago

San Francisco, California, United States LOG10 LLC Full time

About LOG10 LLCAt LOG10 LLC, we're pushing the boundaries of AI-powered applications. Our platform provides AI-powered evaluations, fine-tuning, and debugging tools to address the challenges around reliability and consistency. With a team of experienced professionals from Intel, MosaicML, Adobe, Docker, PostEra, Starburst, and Second Measure, we're...
AI Systems Engineer

4 weeks ago

San Francisco, California, United States LOG10 LLC Full time

About Log10 IncLog10 is a company that addresses the challenges around reliability and consistency of LLM-powered applications. We provide a platform that offers AI-powered evaluations, fine-tuning, and debugging tools.We are a team of 8, with a background in AI and infrastructure roles at companies like Intel, MosaicML, Adobe, Docker, PostEra, Starburst,...
Software Engineer, Inference

4 weeks ago

San Francisco, California, United States Anthropic Full time

About AnthropicAnthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.About the role:Our...
Research Scientist

4 weeks ago

San Francisco, California, United States techire ai Full time

Research Scientist - Multimodal AIWe're seeking an experienced Research Scientist to join our team working on cutting-edge foundational multimodal models. As a key member of our research team, you'll be responsible for shaping the future of multimodal generative AI.About the RoleYou'll need to have a strong research focus on generative models, including...
Software Engineer, AI/ML Infrastructure Specialist

3 weeks ago

San Francisco, California, United States Together AI Full time

Job ResponsibilitiesInfrastructure Development:Identify and resolve infrastructure gaps to ensure reliable, efficient, and scalable AI/ML solutions.AI/ML Solutions:Develop advanced AI/ML infrastructure solutions to enhance the efficiency of our ML teams, leveraging expertise in distributed systems and large-scale data processing.System Design:Design and...
AI Systems Architect

3 weeks ago

San Francisco, California, United States Distyl AI Full time

At Distyl AI, we're pushing the boundaries of what's possible with Large Language Models (LLMs). As a Lead AI Engineer, you'll be responsible for architecting and delivering production-grade AI systems that transform how our Fortune 500 clients operate.Key Responsibilities:Develop and deploy AI systems that meet the complex needs of our enterprise...
AI Systems Architect

4 weeks ago

San Francisco, California, United States Distyl AI Full time

Lead AI EngineerAt Distyl AI, we're pushing the boundaries of what's possible with Large Language Models (LLMs). We're seeking a talented Lead AI Engineer to join our team and help us deliver production-grade AI systems to our Fortune 500 clients.Key Responsibilities:Develop and architect AI systems that meet the complex needs of our clientsPartner with our...
Staff AI Infrastructure Engineer

4 weeks ago

San Francisco, California, United States Genmo Full time

Role OverviewWe are seeking a senior software engineer to join our inference team at Genmo, a research lab dedicated to building open, state-of-the-art models for video generation. The successful candidate will be responsible for designing and scaling our inference systems to support millions of users across multiple data centers.Key ResponsibilitiesDevelop...
Software Engineer, Model Inference Specialist

3 weeks ago

San Francisco, California, United States OpenAI Full time

Key Role: We're seeking a skilled Software Engineer to join our team at OpenAI and contribute to the development of our critical inference infrastructure.About the Job: As an Inference Infrastructure Engineer, you will work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production. Your primary...
Machine Learning Operations Specialist

4 weeks ago

San Francisco, California, United States Together AI Full time

Together AI is seeking a highly skilled MLOps engineer to develop and deploy scalable inference systems for our customers.The ideal candidate will have a strong understanding of machine learning, particularly large language models, and experience with DevOps practices such as CI/CD, automation, and containerization.Responsibilities include:Collaborating with...

Americas

Europe

Asia / Oceania

Africa

AI Inference Systems Engineer