AI Inference Systems Architect

5 days ago


San Francisco, California, United States Tbwa ChiatDay Inc Full time
Perplexity Perks

We are seeking a skilled AI Inference Engineer to join our rapidly growing team in the San Francisco Bay area. With a base salary range of $190,000 - $240,000, this role offers an attractive compensation package.

About the Role

In this position, you will have the opportunity to work on large-scale deployments of machine learning models for real-time inference. Your responsibilities will include developing APIs for AI inference, benchmarking and addressing bottlenecks throughout our inference stack, improving the reliability and observability of our systems, and exploring novel research and implementing LLM inference optimizations.

To be successful in this role, you should have experience with ML systems and deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX), familiarity with common LLM architectures and inference optimization techniques, and experience with deploying reliable, distributed, real-time model serving at scale.

What We Offer

As a valued member of our team, you can expect comprehensive health, dental, and vision insurance for you and your dependents, including a 401(k) plan. In addition to the base salary, equity is part of the total compensation package.

Get Started

Our mobile apps have been installed over 1 million times across iOS and Android devices, and we've served over 500 million queries from users around the globe. If you're passionate about working on large-scale inference systems and have a proven track record, please apply.



  • San Francisco, California, United States Untether AI Full time

    Software Architect for AI InferenceWe are seeking an exceptional Software Architect to join our team at Untether AI, where you will play a key role in designing and developing software that interacts with our innovative chip. As part of our top-notch team, you will collaborate closely with hardware engineers and fellow software engineers to create software...


  • San Francisco, California, United States Abridge AI Inc. Full time

    Unlock the Potential of Healthcare with AbridgeAbridge AI Inc. is revolutionizing the healthcare industry with cutting-edge AI technology, empowering clinicians to focus on patient care while streamlining clinical documentation processes.About the RoleWe are seeking an experienced Transformative AI Systems Architect to join our team and play a pivotal role...


  • San Francisco, California, United States Perplexity AI Full time

    We are seeking an experienced Data Inference Specialist to join our team at Perplexity AI.OverviewAt Perplexity AI, we've achieved tremendous growth and adoption since launching the world's first fully functional conversational answer engine. Our AI-powered search assistant has amassed 10 million monthly active users, with mobile apps installed over 1...


  • San Francisco, California, United States Tbwa ChiatDay Inc Full time

    We are seeking an experienced AI Inference Deployment Specialist to join our team at Skild AI. As a key member of our robotics team, you will be responsible for deploying cutting-edge AI models and optimizing their performance in real-world environments.Role OverviewIn this role, you will work closely with our cross-functional team to design and develop...


  • San Francisco, California, United States Magic AI Full time

    About MagicMagic is a cutting-edge technology company committed to developing safe Artificial General Intelligence (AGI) that accelerates humanity's progress on the world's most pressing challenges. Our mission revolves around automating research and code generation to improve models and solve alignment more reliably than humans alone.We believe our approach...


  • San Francisco, California, United States Magic AI Full time

    About Magic AIAt Magic AI, we're building safe Artificial General Intelligence (AGI) to accelerate humanity's progress on the world's most pressing challenges. Our approach combines frontier-scale pre-training, domain-specific reinforcement learning, ultra-long context, and inference-time compute to achieve this goal.We're seeking a skilled Distributed...


  • San Francisco, California, United States Perplexity AI Full time

    We are a fast-growing AI company looking for an expert machine learning engineer to join our team. Our current stack is Python, C++, TensorRT-LLM, and Kubernetes.You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference. The ideal candidate should have experience with ML systems and deep learning...


  • San Francisco, California, United States ZipRecruiter Full time

    Unlock Your Potential as a Senior AI Infrastructure Software ArchitectOverview:At ZipRecruiter, we're pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. We're redefining AI cloud infrastructure with a mission to align the future of computing with the...


  • San Francisco, California, United States Anyscale Full time

    About Anyscale:Anyscale is a pioneering technology company that empowers software developers to harness the full potential of distributed computing. By commercializing Ray, an open-source project, we're creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify, Instacart, and Cruise trust Ray as a critical...


  • San Francisco, California, United States Together AI Full time

    About the RoleWe are seeking an experienced Systems Research Engineer to join our team at Together AI. As a key member of our research-driven artificial intelligence company, you will play a crucial role in researching and building the next generation AI platform.Company OverviewTogether AI is committed to creating open and transparent AI systems that drive...


  • San Francisco, California, United States Magic AI Full time

    Job OverviewMagic AI's mission is to build safe Artificial General Intelligence (AGI) that accelerates humanity's progress on the world's most pressing challenges. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans alone.About the RoleThis Senior...


  • San Francisco, California, United States Genmo Inc. Full time

    At Genmo Inc., we are a research lab dedicated to building state-of-the-art models for video generation. Our goal is to unlock the potential of Artificial General Intelligence (AGI).Job OverviewWe are seeking a senior/staff software engineer to join our inference team. This role involves designing and scaling our inference systems to support millions of...


  • San Francisco, California, United States Abridge AI Inc. Full time

    Abridge AI Inc. is a pioneering force in healthcare technology, utilizing artificial intelligence to empower deeper understanding and improve clinical documentation efficiency.Role OverviewWe are seeking an exceptional ML Systems Engineer to join our team, responsible for scaling and deploying machine learning models to handle increasing traffic demands and...


  • San Francisco, California, United States Perplexity AI Full time

    Company OverviewWe're Perplexity AI, a rapidly growing company that has experienced tremendous growth and adoption since launching the world's first fully functional conversational answer engine. Our AI-powered search assistant has amassed 10 million monthly active users, with our mobile apps installed over 1 million times across iOS and Android devices....


  • San Jose, California, United States Capital One Full time

    About Capital OneAt Capital One, we are pushing the boundaries of what is possible with AI. We believe that responsible and reliable AI systems can change banking for good. Our team of experts is dedicated to creating innovative solutions that empower our customers and businesses to achieve their goals.About the RoleWe are seeking a skilled Distinguished AI...


  • San Francisco, California, United States Abridge Full time

    Abridge is a pioneering healthcare technology company that leverages artificial intelligence to revolutionize medical conversations and clinical documentation. Our mission-driven team is committed to empowering deeper understanding in healthcare through innovative solutions.We are seeking an experienced Chief Healthcare AI Solutions Architect to join our...


  • San Francisco, California, United States Crusoe Energy Inc Full time

    Crusoe Energy Inc is on a mission to unlock value in stranded energy resources through innovative technology.We are inspired by making sure that the energy meeting the demand for data centers is sourced in an environmentally responsible fashion. Crusoe co-locates mobile data centers with stranded energy resources, like flare gas and underloaded renewables,...


  • San Jose, California, United States Capital One Full time

    Transformative AI ExpertWe are seeking a visionary Distinguished AI Solutions Architect to join our team at Capital One. As an expert in artificial intelligence, you will play a key role in designing and developing innovative AI solutions that drive business growth and customer satisfaction.About the RoleThis is a unique opportunity to work on cutting-edge...


  • San Francisco, California, United States CV Library Full time

    About KuzcoWe are building a large-scale distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute.Our team is a small, well-funded group of staff-level engineers who work together to tackle difficult, high-impact engineering problems in downtown San Francisco.We value creativity alongside...


  • San Francisco, California, United States Crusoe Full time

    About the Role:As a Senior/Staff Software Engineer on the Managed AI team at Crusoe, you'll play a pivotal role in shaping the architecture and scalability of our next-generation AI inference platform. You will lead the design and implementation of core systems for our AI services, including resilient fault-tolerant queues, model catalogs, and scheduling...