AI Inference Engineer

2 weeks ago

San Francisco, United States Perplexity AI Full time

Job DescriptionJob Description

We are looking for an AI Inference to join our growing team. Our current stack is Python, C++, TensorRT-LLM, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

Qualifications

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Experience with deploying reliable, distributed, real-time model serving at scale
(Optional) Understanding of GPU architectures or experience with GPU kernel programming using CUDA

The cash compensation range for this role is $190,000 - $240,000.

At Perplexity, we've experienced tremendous growth and adoption since publicly launching the world's first fully functional conversational answer engine just over a year ago. Our AI-powered search assistant has amassed 10 million monthly active users as of early 2024, with our mobile apps installed over 1 million times across iOS and Android devices. In 2023 alone, we served over 500 million queries from users around the globe.
To support our rapid expansion, we've raised significant funding from some of the most respected investors in technology. In January 2024, we raised $73.6 million in a Series B round led by IVP, with participation from NVIDIA, Jeff Bezos' investment fund, NEA, Databricks, and other prominent firms. We followed that up with a $62.7 million Series B1 round in April 2024 led by Daniel Gross, valuing Perplexity at over $1 billion.
Our prominent investor base includes IVP, NEA, Jeff Bezos, NVIDIA, Databricks, Bessemer Venture Partners, Elad Gil, Nat Friedman, Naval Ravikant, Tobi Lutke, and many other visionary individuals.Final offer amounts are determined by multiple factors, including, experience and expertise, and may vary from the amounts listed above.Equity: In addition to the base salary, equity is part of the total compensation package.Benefits: Comprehensive health, dental, and vision insurance for you and your dependents. Includes a 401(k) plan.

AI Inference Deployment Specialist

2 weeks ago

San Francisco, California, United States Tbwa ChiatDay Inc Full time

We are seeking an experienced AI Inference Deployment Specialist to join our team at Skild AI. As a key member of our robotics team, you will be responsible for deploying cutting-edge AI models and optimizing their performance in real-world environments.Role OverviewIn this role, you will work closely with our cross-functional team to design and develop...
AI Applications Engineer

3 weeks ago

San Francisco, United States Untether AI Full time

Untether AI is looking for a talented AI Applications Engineer to join our Product team to support our customers with SDK for our custom AI accelerator devices. You will be working with data scientists to ensure their AI workloads are ported and running efficiently on Untether AI products.Must be a US or Canadian citizen to apply.Ideal candidate profileYou...
Senior Inference Systems Engineer

2 weeks ago

San Francisco, California, United States Genmo Inc. Full time

At Genmo Inc., we are a research lab dedicated to building state-of-the-art models for video generation. Our goal is to unlock the potential of Artificial General Intelligence (AGI).Job OverviewWe are seeking a senior/staff software engineer to join our inference team. This role involves designing and scaling our inference systems to support millions of...
Software Engineer, Model Inference

2 weeks ago

San Francisco, United States OpenAI Full time

Software Engineer, Model Inference | OpenAICareersSoftware Engineer, Model InferenceApplied AI Engineering - San FranciscoAbout the TeamOur team brings OpenAI’s most capable technology to the world through our products. Most recently, we released ChatGPT, GPT-4, the Whisper API, and DALL-E. We empower consumers and developers alike to use and access our...
Senior Software Engineer

2 weeks ago

San Francisco, United States CentML Full time

About Us We believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential. Our founding team is made up of experts in AI, compilers, and ML hardware and has led...
Senior Software Engineer

3 days ago

San Francisco, United States Untether AI Full time

We are looking for best in class engineers to join our existing top-notch team. When you join Untether AI, you will be part of a team that designs, develops and verifies the software that interacts with our chip, collaborating with our hardware engineers and with fellow software engineers in the process. By creating software that fully realizes the...
Senior Software Engineer

3 days ago

San Francisco, United States Untether AI Full time

We are looking for best in class engineers to join our existing top-notch team. When you join Untether AI, you will be part of a team that designs, develops and verifies the software that interacts with our chip, collaborating with our hardware engineers and with fellow software engineers in the process. By creating software that fully realizes the...
Senior Software Engineer

2 weeks ago

San Francisco, United States CentML Inc. Full time

About UsWe believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential.Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at...
AI Engineer

3 weeks ago

San Francisco, United States Hyperbolic Labs Full time

Who We Are: Hyperbolic Labs is on a mission to democratize AI by breaking down the barriers to computing power with our Open-Access AI Cloud. By making better use of idle computing resources across the globe, we offer an innovative GPU marketplace and AI inference service that promise affordability and accessibility for all. As pioneers at the intersection...
Senior Software Engineer

3 weeks ago

San Francisco, United States Untether AI Full time

We are looking for best in class engineers to join our existing top-notch team. When you join Untether AI, you will be part of a team that designs, develops and verifies the software that interacts with our chip, collaborating with our hardware engineers and with fellow software engineers in the process. By creating software that fully realizes the...
Software Engineer, Model Inference

1 week ago

San Francisco, United States OpenAI Full time

About the TeamOur team brings OpenAI’s most capable technology to the world through our products. Most recently, we released ChatGPT, GPT-4, the Whisper API, and DALL-E. We empower consumers and developers alike to use and access our state-of-the-art AI models, allowing them to do things that they’ve never been able to before.Across all product lines, we...
Software Engineer, Model Inference

4 weeks ago

San Francisco, United States OpenAI Full time

About the Team Our team brings OpenAI's most capable technology to the world through our products. Most recently, we released ChatGPT, GPT-4, the Whisper API, and DALL-E. We empower consumers and developers alike to use and access our start-of-the-art AI models, allowing them to do things that they've never been able to before. Across all product lines, we...
Lead AI Engineer

4 weeks ago

San Francisco, United States Distyl AI Full time

Distyl AI develops production-grade AI systems to power core operational workflows for the Fortune 500. Working in partnership with OpenAI, Distyl brings deep expertise in enterprise AI, and technical investments that support the development of production-grade AI systems with rapid time-to-value. Led by proven leaders from top companies like Palantir and...
Full Stack Engineer

1 week ago

san francisco, United States Virtue AI Full time

Virtue AI is based in San Francisco and this position will be onsite in San Francisco. About The RoleThe future of AI will depend on our ability to keep it safe and responsible. We're seeking an experienced Full Stack Engineer to champion our efforts in doing so. Virtue AI seeks an experienced Full Stack Engineer who is passionate about AI safety and...
Full Stack Engineer

4 weeks ago

San Francisco, United States Virtue AI Full time

Virtue AI is based in San Francisco and this position will be onsite in San Francisco. About The RoleThe future of AI will depend on our ability to keep it safe and responsible. We're seeking an experienced Full Stack Engineer to champion our efforts in doing so. Virtue AI seeks an experienced Full Stack Engineer who is passionate about AI safety and...
Full Stack Engineer

4 weeks ago

san francisco, United States Virtue AI Full time

Virtue AI is based in San Francisco and this position will be onsite in San Francisco.About The RoleThe future of AI will depend on our ability to keep it safe and responsible. We're seeking an experienced Full Stack Engineer to champion our efforts in doing so. Virtue AI seeks an experienced Full Stack Engineer who is passionate about AI safety and...
Full Stack Engineer

4 weeks ago

San Francisco, United States Virtue AI Full time

Virtue AI is based in San Francisco and this position will be onsite in San Francisco.About The RoleThe future of AI will depend on our ability to keep it safe and responsible. We're seeking an experienced Full Stack Engineer to champion our efforts in doing so. Virtue AI seeks an experienced Full Stack Engineer who is passionate about AI safety and...
Full Stack Engineer

3 weeks ago

san francisco, United States Virtue AI Full time

Virtue AI is based in San Francisco and this position will be onsite in San Francisco.About The RoleThe future of AI will depend on our ability to keep it safe and responsible. We're seeking an experienced Full Stack Engineer to champion our efforts in doing so. Virtue AI seeks an experienced Full Stack Engineer who is passionate about AI safety and...
Senior Manager, AI Engineer

3 weeks ago

San Francisco, United States Capital One Full time

Center 3 (19075), United States of America, McLean, VirginiaSenior Manager, AI EngineerOverview:At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine learning to create real-time, personalized customer experiences. Our investments in technology...
Lead AI Engineer

3 weeks ago

San Francisco, United States Distyl AI, Inc. Full time

Distyl AI develops production-grade AI systems to power core operational workflows for the Fortune 500. Working in partnership with OpenAI, Distyl brings deep expertise in enterprise AI, and technical investments that support the development of production-grade AI systems with rapid time-to-value.Led by proven leaders from top companies like Palantir and...

Americas

Europe

Asia / Oceania

Africa

AI Inference Engineer