AI Inference Systems Engineer
3 weeks ago
We are seeking an experienced AI Inference Systems Engineer to join our growing team at Perplexity AI. Our current stack includes Python, C++, TensorRT-LLM, and Kubernetes, providing a unique opportunity to work on large-scale deployment of machine learning models for real-time inference.
Key Responsibilities:
- Develop APIs for AI inference that will be used by both internal and external customers
- Benchmark and address bottlenecks throughout our inference stack
- Improve the reliability and observability of our systems and respond to system outages
- Explore novel research and implement LLM inference optimizations
Requirements:
- Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
- Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
- Experience with deploying reliable, distributed, real-time model serving at scale
- (Optional) Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Compensation:
The cash compensation range for this role is $190,000 - $240,000. In addition to the base salary, equity is part of the total compensation package. Comprehensive health, dental, and vision insurance for you and your dependents, including a 401(k) plan.
-
Senior Inference Optimization Engineer
4 weeks ago
San Francisco, California, United States Liquid AI Full timeAbout the RoleWe're seeking a highly skilled engineer to join our team at Liquid AI, where you'll play a critical role in optimizing inference stacks for our AI models.As a key member of our team, you'll be responsible for taking our models and delivering highly optimized inference stacks that leverage existing frameworks like ggml, vllm, and DeepSpeed to...
-
Senior Inference Systems Engineer
11 hours ago
San Francisco, California, United States Genmo Inc. Full timeAt Genmo Inc., we are a research lab dedicated to building state-of-the-art models for video generation. Our goal is to unlock the potential of Artificial General Intelligence (AGI).Job OverviewWe are seeking a senior/staff software engineer to join our inference team. This role involves designing and scaling our inference systems to support millions of...
-
AI Inference Deployment Specialist
2 days ago
San Francisco, California, United States Tbwa ChiatDay Inc Full timeWe are seeking an experienced AI Inference Deployment Specialist to join our team at Skild AI. As a key member of our robotics team, you will be responsible for deploying cutting-edge AI models and optimizing their performance in real-world environments.Role OverviewIn this role, you will work closely with our cross-functional team to design and develop...
-
Senior Inference Optimization Engineer
3 weeks ago
San Francisco, California, United States Liquid AI Full timeAt Liquid AI, we're seeking a highly skilled engineer to optimize inference stacks tailored to various hardware platforms.The ideal candidate has extensive experience in CUDA, C++, and Triton, as well as a deep understanding of GPU, CPU, and NPU architectures.They should be self-motivated, capable of working independently, and driven by a passion for...
-
Software Engineer, Inference
4 weeks ago
San Francisco, California, United States Anthropic Full timeAbout AnthropicAnthropic is a public benefit corporation headquartered in San Francisco, dedicated to creating reliable, interpretable, and steerable AI systems. Our mission is to develop AI that is safe and beneficial for users and society as a whole.Job DescriptionWe are seeking a skilled Software Engineer, Inference to join our Inference team. As a key...
-
Platform ML Engineering Manager, Inference
4 weeks ago
San Francisco, California, United States Openai Full timeAbout the TeamThe Platform ML team is responsible for building the ML side of our internal training framework, which is used to train cutting-edge models.We work on distributed model execution, as well as the interfaces and implementation for model code, training, and inference.Our priorities are to maximize training throughput and researcher throughput,...
-
AI Infrastructure Architect
4 weeks ago
San Francisco, California, United States Together AI Full timeAI Infrastructure Expertise:Design and implement high-performance AI/ML infrastructure, ensuring scalability, availability, and efficient resource utilization.Automation and Optimization:Develop and deploy automation tools, monitoring solutions, and operational strategies to streamline infrastructure management and reduce manual tasks.Collaboration and...
-
AI Systems Engineer
3 weeks ago
San Francisco, California, United States LOG10 LLC Full timeLog10 LLC is addressing the challenges around reliability and consistency of LLM-powered applications via a platform that provides AI-powered evaluations, fine-tuning, and debugging tools.We are a team of experienced professionals having previously worked in AI and infra roles at companies such as Intel, MosaicML, Adobe, Docker, PostEra, Starburst, and...
-
Machine Learning Operations Engineer
3 weeks ago
San Francisco, California, United States Together AI Full timeTogether AI is seeking an experienced MLOps engineer to develop and deploy scalable AI/ML systems. The ideal candidate will have a strong understanding of machine learning, particularly large language models, and experience with DevOps practices like CI/CD, automation, and containerization.Key ResponsibilitiesDesign and implement runtime systems for...
-
AI Systems Engineer
3 weeks ago
San Francisco, California, United States Distyl AI Full timeAt Distyl AI, we're pushing the boundaries of AI innovation to power core operational workflows for the Fortune 500. We're seeking an experienced Frontend Engineer to join our team and help define the future of work with a focus on human value.You will build the UI/UX patterns in which AI is deployed and used by the world's most important institutions....
-
AI Systems Engineer
3 weeks ago
San Francisco, California, United States LOG10 LLC Full timeAbout LOG10 LLCAt LOG10 LLC, we're pushing the boundaries of AI-powered applications. Our platform provides AI-powered evaluations, fine-tuning, and debugging tools to address the challenges around reliability and consistency. With a team of experienced professionals from Intel, MosaicML, Adobe, Docker, PostEra, Starburst, and Second Measure, we're...
-
AI Systems Engineer
4 weeks ago
San Francisco, California, United States LOG10 LLC Full timeAbout Log10 IncLog10 is a company that addresses the challenges around reliability and consistency of LLM-powered applications. We provide a platform that offers AI-powered evaluations, fine-tuning, and debugging tools.We are a team of 8, with a background in AI and infrastructure roles at companies like Intel, MosaicML, Adobe, Docker, PostEra, Starburst,...
-
Software Engineer, Inference
4 weeks ago
San Francisco, California, United States Anthropic Full timeAbout AnthropicAnthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.About the role:Our...
-
Research Scientist
4 weeks ago
San Francisco, California, United States techire ai Full timeResearch Scientist - Multimodal AIWe're seeking an experienced Research Scientist to join our team working on cutting-edge foundational multimodal models. As a key member of our research team, you'll be responsible for shaping the future of multimodal generative AI.About the RoleYou'll need to have a strong research focus on generative models, including...
-
San Francisco, California, United States Together AI Full timeJob ResponsibilitiesInfrastructure Development:Identify and resolve infrastructure gaps to ensure reliable, efficient, and scalable AI/ML solutions.AI/ML Solutions:Develop advanced AI/ML infrastructure solutions to enhance the efficiency of our ML teams, leveraging expertise in distributed systems and large-scale data processing.System Design:Design and...
-
AI Systems Architect
3 weeks ago
San Francisco, California, United States Distyl AI Full timeAt Distyl AI, we're pushing the boundaries of what's possible with Large Language Models (LLMs). As a Lead AI Engineer, you'll be responsible for architecting and delivering production-grade AI systems that transform how our Fortune 500 clients operate.Key Responsibilities:Develop and deploy AI systems that meet the complex needs of our enterprise...
-
AI Systems Architect
4 weeks ago
San Francisco, California, United States Distyl AI Full timeLead AI EngineerAt Distyl AI, we're pushing the boundaries of what's possible with Large Language Models (LLMs). We're seeking a talented Lead AI Engineer to join our team and help us deliver production-grade AI systems to our Fortune 500 clients.Key Responsibilities:Develop and architect AI systems that meet the complex needs of our clientsPartner with our...
-
Staff AI Infrastructure Engineer
4 weeks ago
San Francisco, California, United States Genmo Full timeRole OverviewWe are seeking a senior software engineer to join our inference team at Genmo, a research lab dedicated to building open, state-of-the-art models for video generation. The successful candidate will be responsible for designing and scaling our inference systems to support millions of users across multiple data centers.Key ResponsibilitiesDevelop...
-
Software Engineer, Model Inference Specialist
3 weeks ago
San Francisco, California, United States OpenAI Full timeKey Role: We're seeking a skilled Software Engineer to join our team at OpenAI and contribute to the development of our critical inference infrastructure.About the Job: As an Inference Infrastructure Engineer, you will work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production. Your primary...
-
Machine Learning Operations Specialist
4 weeks ago
San Francisco, California, United States Together AI Full timeTogether AI is seeking a highly skilled MLOps engineer to develop and deploy scalable inference systems for our customers.The ideal candidate will have a strong understanding of machine learning, particularly large language models, and experience with DevOps practices such as CI/CD, automation, and containerization.Responsibilities include:Collaborating with...