Lead Software Engineer, Model Serving Platform

3 days ago

San Francisco, California, United States Sciforium Full time

Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the full stack powering frontier AI models and real-time applications.

We offer a fast-moving, collaborative environment where engineers have meaningful impact, learn quickly, and tackle deep technical challenges across the AI systems stack.

Role Overview
This is a rare chance to help architect and lead the development of Sciforium's next-generation model serving platform
,
the high-performance engine that will bring a multimodal, highly efficient foundation model to market. As a senior technical leader, you'll not only build core components yourself but also
guide and mentor other engineers
, influencing engineering direction, standards, and execution quality.

You will learn and shape the full AI stack: from GPU kernels and quantized execution paths to distributed serving, scheduling, and the APIs that power real-time AI applications. If you enjoy deep systems work, thrive on ownership, and want to lead engineers in building foundational AI infrastructure, this role puts you at the center of SciForium's mission and growth.

Key Responsibilities

Lead the technical direction of the model serving platform, owning architecture decisions and guiding engineering execution.
Build core serving components including execution runtimes, batching, scheduling, and distributed inference systems.
Develop high-performance C++ and CUDA/HIP modules, including custom GPU kernels and memory-optimized runtimes.
Collaborate with ML researchers to productionize new multimodal models and ensure low-latency, scalable inference.
Build Python APIs and services that expose model capabilities to downstream applications.
Mentor and support other engineers through code reviews, design discussions, and hands-on technical guidance.
Drive performance profiling, benchmarking, and observability across the inference stack.
Ensure high reliability and maintainability through testing, monitoring, and engineering best practices.
Troubleshoot and resolve complex issues across GPU, runtime, and service layers.

Must-Haves

Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
5+ years of experience designing and building scalable, reliable backend systems or distributed infrastructure.
Strong understanding of LLM inference mechanics (prefill vs decode, batching, KV cache)
Experience with Kubernetes/Ray, Containerization
Strong proficiency in C++, Python.
Strong debugging, profiling, and performance optimization skills at the system level.
Ability to collaborate closely with ML researchers and translate model or runtime requirements into production-grade systems.
Effective communication skills and the ability to lead technical discussions, mentor engineers, and drive engineering quality.
Comfortable working from the office and contributing to a fast-moving, high-ownership team culture.

Nice to Have

Experience with ML systems engineering, distributed GPU scheduling, open source inference engine like vLLM, Sglang, or TRT-LLM
Experience in building large scale ML/MLOps infrastructure
Proficiency in CUDA or ROCm and experience with GPU profiling tools
Experience at an AI/ML startup, research lab, or Big Tech infrastructure/ML team.
Familiarity with multimodal model architectures, raw-byte models, or efficient inference techniques.
Contributions to open-source ML or HPC infrastructure

Why Join Us

Opportunity to build frontier-scale AI infrastructure powering next-generation LLMs and multimodal models.
Work with top-tier engineers and researchers across systems, GPUs, and ML frameworks.
Tackle high-impact performance and scalability challenges in training and inference.
Access state-of-the-art GPU clusters, datasets, and tooling.
Opportunity to publish, patent, and push the boundaries of modern AI
Join a culture of innovation, ownership, and fast execution in a rapidly scaling AI organization.

Benefits Include

Medical, dental, and vision insurance
401k plan
Daily lunch, snacks, and beverages
Flexible time off
Competitive salary and equity

Equal opportunity
Sciforium is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

Staff Software Engineer, Model Serving

2 weeks ago

San Francisco, California, United States Databricks Full time $192,000 - $260,000 per year

At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve...
Staff Software Engineer, ML Serving Platform

1 week ago

San Francisco, California, United States DoorDash Full time

About The TeamDoorDash is building the world's most reliable on-demand logistics engine. Behind the scenes, our Machine Learning Platform (MLP) powers critical real-time decision-making for millions of orders each day, supporting business-critical use cases like Ads, Groceries, Logistics, Fraud, and Search.About The RoleWe're looking for a Staff Software...
Senior Software Engineer, Platform

24 hours ago

San Francisco, California, United States Beacon Software Full time

Beacon Software is a permanent capital holding company which acquires and grows essential businesses. We are a profitable series B+ firm that combines great technologists, operators and M&A professionals to accelerate the scale of the ambition of the dozens of businesses we own and operate. We are supported by capital from tier-1 venture capital, crossover,...
Staff Software Engineer, Machine Learning Platform

23 hours ago

San Francisco, California, United States Discord Full time

Discord is used by over 200 million people every month for many different reasons, but there's one thing that nearly everyone does on our platform:play video games.Over 90% of our users play games, spending a combined 1.5 billion hours playing thousands of unique titles on Discord each month. Discord plays a uniquely important role in the future of gaming....
Staff Software Engineer, ML Platform

3 days ago

San Francisco, California, United States Attentive Full time

Attentive is the AI marketing platform for 1:1 personalization redefining the way brands and people connect. We're the only marketing platform that combines powerful technology with human expertise to build authentic customer relationships. By unifying SMS, RCS, email, and push notifications, our AI-powered personalization engine delivers bespoke experiences...
Software Engineer, Scientific Models

1 week ago

San Francisco, California, United States Benchling Full time $165,113 - $223,388

Biotechnology is rewriting life as we know it, from the medicines we take, to the crops we grow, the materials we wear, and the household goods that we rely on every day. But moving at the new speed of science requires better technology.Benchling's mission is to unlock the power of biotechnology. The world's most innovative biotech companies use Benchling's...
Senior Software Engineer, ML Platform

2 weeks ago

San Francisco, California, United States Attentive Full time $170,000 - $230,000 per year

Attentive is the AI-powered mobile marketing platform transforming the way brands personalize consumer engagement. Attentive enables marketers to craft tailored journeys for every subscriber, driving higher recurring revenue and maximizing campaign performance. Activating real-time data from multiple channels and advanced AI, the platform personalizes...
Senior Software Engineer, Core Platform

3 days ago

San Francisco, California, United States Casca Full time

Why Casca?Casca is building AGI for banking. We're replacing decades-old legacy systems with AI-native technology that automates 90% of the manual work humans once had to do. Role OverviewWe're seeking a Senior Software Engineer to spearhead our Core Platform function. In this high-leverage role, you'll design and scale the foundational systems that power...
Lead Software Engineer

3 days ago

San Francisco, California, United States Troveo AI Full time $200,000 - $300,000

About TroveoTroveo is building the next-generation data platform to train AI video models. We offer the world's largest library of AI video training data—featuring millions of hours of licensed video content. Our end-to-end data pipeline connects creators, rights holders, and AI research labs, enabling scalable, compliant, and innovative uses of video for...
Lead Application Security Engineer

5 days ago

San Francisco, California, United States Coupa Software, Inc. Full time $142,000 - $184,500

Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter,...

Americas

Europe

Asia / Oceania

Africa

Lead Software Engineer, Model Serving Platform