Senior Inference Systems Engineer

2 months ago

San Francisco, California, United States Genmo Inc. Full time

At Genmo Inc., we are a research lab dedicated to building state-of-the-art models for video generation. Our goal is to unlock the potential of Artificial General Intelligence (AGI).

Job Overview

We are seeking a senior/staff software engineer to join our inference team. This role involves designing and scaling our inference systems to support millions of users across 20+ data centers.

Key Responsibilities:

Develop high-performance, low-latency inference pipelines using advanced technologies.
Create scalable backend services that support our AI-powered content creation platform.
Implement model serving infrastructure using Kubernetes and other cloud-native solutions.
Collaborate with ML engineers to transition models from research to production environments.
Design APIs for integrating our AI capabilities into partner ecosystems.
Implement monitoring, logging, and alerting systems for backend services and model inference.
Develop monitoring infrastructure for our ML serving pipeline and apply advanced model compression techniques to improve inference performance.

Qualifications:

Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
5+ years of experience in software engineering, with at least 3 years focusing on backend systems and ML infrastructure.
Strong past experience with Ray or Kubernetes.
Proficiency in Python and at least one systems programming language (Rust, C++, or Go).
Solid understanding of model serving frameworks (e.g., TensorFlow Serving, NVIDIA Triton).
Experience with a ML framework such as TensorFlow, PyTorch, or JAX.
Experience with model compression and optimization techniques.
Strong knowledge of cloud platforms (AWS, GCP, or Azure) and their ML-specific services.
Familiarity with distributed systems and microservices architectures.
Experience with high-performance, low-latency systems.

Salary:

The estimated annual salary for this position is around $200,000, based on industry standards and Bay Area location.

Senior Inference Systems Engineer

2 weeks ago

San Francisco, California, United States Genmo Full time

We are seeking a highly skilled Senior Inference Systems Engineer to join our team at Genmo, a research lab dedicated to building open, state-of-the-art models for video generation.Job SummaryThe successful candidate will be responsible for designing and scaling our inference systems as they grow to support over millions of users across more than 20...
Senior Inference Architect

1 week ago

San Francisco, California, United States OpenAI Full time

We are seeking a Senior Inference Architect to join our team at OpenAI. As a key member of our infrastructure engineering team, you will be responsible for scaling up our critical inference infrastructure, which efficiently services every customer request to use our state-of-the-art AI models.In this role, you will work alongside machine learning...
Machine Learning Engineer

2 weeks ago

San Francisco, California, United States Together AI Full time

About the Role">We are looking for a talented Machine Learning Engineer to join our team at Together AI. As an MLOps engineer, you will develop systems and APIs that enable our customers to perform inference and fine-tune LLMs.">Responsibilities">Develop and deploy systems and APIs that enable customers to perform inference and fine-tune LLMs.Work closely...
Senior Software Engineer for Scalable AI Inference Systems

2 weeks ago

San Francisco, California, United States Genmo Full time

Role OverviewWe are seeking a seasoned software engineer to join our inference team at Genmo, a pioneering research lab focused on developing cutting-edge models for video generation. This role presents an exciting opportunity to shape the future of AI and push the boundaries of what's possible in video generation.In this position, you will be responsible...
Machine Learning Engineer

3 weeks ago

San Francisco, California, United States Perplexity AI Full time

Job DescriptionWe are seeking an AI Inference Engineer to join our growing team. As a key member of our engineering team, you will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.Benchmark and address bottlenecks throughout our inference stackImprove the reliability and observability of our systems...
Senior AI Engineer for Multimodal LLMs

4 weeks ago

San Francisco, California, United States Waveforms Full time

Unlock the Future of Audio IntelligenceWe are seeking a skilled Senior AI Engineer to join our team at WaveForms AI, an innovative company revolutionizing human-AI interactions with advanced audio large language models (LLMs).About the RoleThis is an exceptional opportunity to contribute to cutting-edge AI systems that transform user experiences across...
Data Inference Specialist

1 month ago

San Francisco, California, United States Perplexity AI Full time

We are seeking an experienced Data Inference Specialist to join our team at Perplexity AI.OverviewAt Perplexity AI, we've achieved tremendous growth and adoption since launching the world's first fully functional conversational answer engine. Our AI-powered search assistant has amassed 10 million monthly active users, with mobile apps installed over 1...
Expert Machine Learning Engineer for Real-Time Inference

1 month ago

San Francisco, California, United States Perplexity AI Full time

We are a fast-growing AI company looking for an expert machine learning engineer to join our team. Our current stack is Python, C++, TensorRT-LLM, and Kubernetes.You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference. The ideal candidate should have experience with ML systems and deep learning...
Inference Infrastructure Engineer

1 week ago

San Francisco, California, United States OpenAI Full time

About OpenAIOpenAI is a pioneering AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.We believe artificial intelligence has the potential to help people solve...
Machine Learning Inference Specialist

3 weeks ago

San Francisco, California, United States Perplexity AI Full time

Company OverviewPerplexity AI is a leading innovator in the field of conversational answer engines, boasting 10 million monthly active users and serving over 500 million queries worldwide.We've experienced tremendous growth since publicly launching our fully functional search assistant just over a year ago and have raised significant funding from top...
Inference Performance Specialist

2 weeks ago

San Francisco, California, United States Liquid AI Full time

Job DescriptionWe are looking for a talented Senior Optimization Engineer to join our team and help us develop highly optimized ML inference stacks for various hardware platforms. The successful candidate will have extensive experience in coding, with expertise in Python, PyTorch, CUDA, and C++. They should be able to work independently, taking ownership of...
Inference Stack Architect

2 weeks ago

San Francisco, California, United States Liquid AI Full time

Harness Machine Learning Potential: As a key member of our team, you'll play a vital role in shaping the future of machine learning at Liquid AI. With a competitive salary range of $150,000 - $170,000 per annum, depending on experience and qualifications, you'll have the opportunity to grow professionally and make a meaningful impact. Job Description: Our...
High-Performance Inference Engineer

2 weeks ago

San Francisco, California, United States Liquid AI Full time

About the RoleAs we prepare to deploy our models across various device types, including GPUs, CPUs, and NPUs, we're seeking an expert who can optimize inference stacks tailored to each platform. We're looking for someone with exceptional technical skills and a deep understanding of GPU, CPU, and NPU architectures.The ideal candidate is a highly skilled...
AI Inference Software Architect

1 month ago

San Francisco, California, United States Untether AI Full time

Software Architect for AI InferenceWe are seeking an exceptional Software Architect to join our team at Untether AI, where you will play a key role in designing and developing software that interacts with our innovative chip. As part of our top-notch team, you will collaborate closely with hardware engineers and fellow software engineers to create software...
AI Inference Deployment Specialist

2 months ago

San Francisco, California, United States Tbwa ChiatDay Inc Full time

We are seeking an experienced AI Inference Deployment Specialist to join our team at Skild AI. As a key member of our robotics team, you will be responsible for deploying cutting-edge AI models and optimizing their performance in real-world environments.Role OverviewIn this role, you will work closely with our cross-functional team to design and develop...
Senior AI Model Optimization Engineer

2 weeks ago

San Francisco, California, United States Lumicity Full time

About LumicityWe are a pioneering company in generative video models, pushing the boundaries of AI innovation. With a strong presence in San Francisco and over $10M in funding, we're expanding our team to tackle cutting-edge challenges.Salary: $180,000 - $220,000 per annumThe RoleWe're seeking a highly skilled Senior AI Model Optimization Engineer to join...
Data-Driven Inference Architect

2 weeks ago

San Francisco, California, United States Lumicity Full time

About LumicityWe are a San Francisco-based company developing innovative generative video models that allow users to create animated pictures with ease. Our goal is to push the boundaries of AI-driven video creation, and we're looking for talented engineers to join our team.Salary: $200,000 - $250,000 per annumThe Job DescriptionWe're seeking a Data-Driven...
Large Scale Inference Specialist

2 weeks ago

San Francisco, California, United States Anyscale Full time

Role Overview:This is a critical role at Anyscale as it allows us to provide market-leading performance and price point for AI infrastructure. As a Large Scale Inference Specialist, you will help push the boundaries of performance for inference at large scale. This involves iterating quickly with product teams to ship end-to-end solutions for Batch and...
Senior GenAI Engineer

2 weeks ago

San Francisco, California, United States Amazon Full time

About the Position:">We are seeking a Senior GenAI Specialist Solutions Architect to join our team at Amazon. This role will be responsible for designing and implementing cloud-based solutions that leverage Generative AI (GenAI) technologies.">As a Senior GenAI Specialist Solutions Architect, you will work closely with our engineering teams to develop...
Senior Machine Learning Engineer

2 weeks ago

San Francisco, California, United States Recruiting from Scratch Full time

About the JobWe are seeking an experienced Senior ML Infrastructure Engineer to join our team at Recruiting from Scratch in San Francisco, CA.Job Overview:We are looking for a highly skilled engineer to design and implement large-scale, fault-tolerant systems for our inference network. The ideal candidate will have experience working with distributed...

Americas

Europe

Asia / Oceania

Africa

Senior Inference Systems Engineer