Lead Software Engineer, Model Serving Platform
3 days ago
Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the full stack powering frontier AI models and real-time applications.
We offer a fast-moving, collaborative environment where engineers have meaningful impact, learn quickly, and tackle deep technical challenges across the AI systems stack.
Role Overview
This is a rare chance to help architect and lead the development of Sciforium's next-generation model serving platform
,
the high-performance engine that will bring a multimodal, highly efficient foundation model to market. As a senior technical leader, you'll not only build core components yourself but also
guide and mentor other engineers
, influencing engineering direction, standards, and execution quality.
You will learn and shape the full AI stack: from GPU kernels and quantized execution paths to distributed serving, scheduling, and the APIs that power real-time AI applications. If you enjoy deep systems work, thrive on ownership, and want to lead engineers in building foundational AI infrastructure, this role puts you at the center of SciForium's mission and growth.
Key Responsibilities
- Lead the technical direction of the model serving platform, owning architecture decisions and guiding engineering execution.
- Build core serving components including execution runtimes, batching, scheduling, and distributed inference systems.
- Develop high-performance C++ and CUDA/HIP modules, including custom GPU kernels and memory-optimized runtimes.
- Collaborate with ML researchers to productionize new multimodal models and ensure low-latency, scalable inference.
- Build Python APIs and services that expose model capabilities to downstream applications.
- Mentor and support other engineers through code reviews, design discussions, and hands-on technical guidance.
- Drive performance profiling, benchmarking, and observability across the inference stack.
- Ensure high reliability and maintainability through testing, monitoring, and engineering best practices.
- Troubleshoot and resolve complex issues across GPU, runtime, and service layers.
Must-Haves
- Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
- 5+ years of experience designing and building scalable, reliable backend systems or distributed infrastructure.
- Strong understanding of LLM inference mechanics (prefill vs decode, batching, KV cache)
- Experience with Kubernetes/Ray, Containerization
- Strong proficiency in C++, Python.
- Strong debugging, profiling, and performance optimization skills at the system level.
- Ability to collaborate closely with ML researchers and translate model or runtime requirements into production-grade systems.
- Effective communication skills and the ability to lead technical discussions, mentor engineers, and drive engineering quality.
- Comfortable working from the office and contributing to a fast-moving, high-ownership team culture.
Nice to Have
- Experience with ML systems engineering, distributed GPU scheduling, open source inference engine like vLLM, Sglang, or TRT-LLM
- Experience in building large scale ML/MLOps infrastructure
- Proficiency in CUDA or ROCm and experience with GPU profiling tools
- Experience at an AI/ML startup, research lab, or Big Tech infrastructure/ML team.
- Familiarity with multimodal model architectures, raw-byte models, or efficient inference techniques.
- Contributions to open-source ML or HPC infrastructure
Why Join Us
- Opportunity to build frontier-scale AI infrastructure powering next-generation LLMs and multimodal models.
- Work with top-tier engineers and researchers across systems, GPUs, and ML frameworks.
- Tackle high-impact performance and scalability challenges in training and inference.
- Access state-of-the-art GPU clusters, datasets, and tooling.
- Opportunity to publish, patent, and push the boundaries of modern AI
- Join a culture of innovation, ownership, and fast execution in a rapidly scaling AI organization.
Benefits Include
- Medical, dental, and vision insurance
- 401k plan
- Daily lunch, snacks, and beverages
- Flexible time off
- Competitive salary and equity
Equal opportunity
Sciforium is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.
-
Staff Software Engineer, Model Serving
2 weeks ago
San Francisco, California, United States Databricks Full time $192,000 - $260,000 per yearAt Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve...
-
Staff Software Engineer, ML Serving Platform
1 week ago
San Francisco, California, United States DoorDash Full timeAbout The TeamDoorDash is building the world's most reliable on-demand logistics engine. Behind the scenes, our Machine Learning Platform (MLP) powers critical real-time decision-making for millions of orders each day, supporting business-critical use cases like Ads, Groceries, Logistics, Fraud, and Search.About The RoleWe're looking for a Staff Software...
-
Senior Software Engineer, Platform
24 hours ago
San Francisco, California, United States Beacon Software Full timeBeacon Software is a permanent capital holding company which acquires and grows essential businesses. We are a profitable series B+ firm that combines great technologists, operators and M&A professionals to accelerate the scale of the ambition of the dozens of businesses we own and operate. We are supported by capital from tier-1 venture capital, crossover,...
-
Staff Software Engineer, Machine Learning Platform
23 hours ago
San Francisco, California, United States Discord Full timeDiscord is used by over 200 million people every month for many different reasons, but there's one thing that nearly everyone does on our platform:play video games.Over 90% of our users play games, spending a combined 1.5 billion hours playing thousands of unique titles on Discord each month. Discord plays a uniquely important role in the future of gaming....
-
Staff Software Engineer, ML Platform
3 days ago
San Francisco, California, United States Attentive Full timeAttentive is the AI marketing platform for 1:1 personalization redefining the way brands and people connect. We're the only marketing platform that combines powerful technology with human expertise to build authentic customer relationships. By unifying SMS, RCS, email, and push notifications, our AI-powered personalization engine delivers bespoke experiences...
-
Software Engineer, Scientific Models
1 week ago
San Francisco, California, United States Benchling Full time $165,113 - $223,388Biotechnology is rewriting life as we know it, from the medicines we take, to the crops we grow, the materials we wear, and the household goods that we rely on every day. But moving at the new speed of science requires better technology.Benchling's mission is to unlock the power of biotechnology. The world's most innovative biotech companies use Benchling's...
-
Senior Software Engineer, ML Platform
2 weeks ago
San Francisco, California, United States Attentive Full time $170,000 - $230,000 per yearAttentive is the AI-powered mobile marketing platform transforming the way brands personalize consumer engagement. Attentive enables marketers to craft tailored journeys for every subscriber, driving higher recurring revenue and maximizing campaign performance. Activating real-time data from multiple channels and advanced AI, the platform personalizes...
-
Senior Software Engineer, Core Platform
3 days ago
San Francisco, California, United States Casca Full timeWhy Casca?Casca is building AGI for banking. We're replacing decades-old legacy systems with AI-native technology that automates 90% of the manual work humans once had to do. Role OverviewWe're seeking a Senior Software Engineer to spearhead our Core Platform function. In this high-leverage role, you'll design and scale the foundational systems that power...
-
Lead Software Engineer
3 days ago
San Francisco, California, United States Troveo AI Full time $200,000 - $300,000About TroveoTroveo is building the next-generation data platform to train AI video models. We offer the world's largest library of AI video training data—featuring millions of hours of licensed video content. Our end-to-end data pipeline connects creators, rights holders, and AI research labs, enabling scalable, compliant, and innovative uses of video for...
-
Lead Application Security Engineer
5 days ago
San Francisco, California, United States Coupa Software, Inc. Full time $142,000 - $184,500Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter,...