Lead Software Engineer, Model Serving Platform
1 day ago
Lead Software Engineer, Model Serving Platform Scroll down to find the complete details of the job offer, including experience required and associated duties and tasks. Join to apply for the Lead Software Engineer, Model Serving Platform role at Sciforium . Sciforium is an AI infrastructure company developing next‑generation multimodal AI models and a proprietary, high‑efficiency serving platform. Backed by multi‑million‑dollar funding and direct sponsorship from AMD with hands‑on support from AMD engineers, the team is scaling rapidly to build the full stack powering frontier AI models and real‑time applications. We offer a fast‑moving, collaborative environment where engineers have meaningful impact, learn quickly, and tackle deep technical challenges across the AI systems stack. Role Overview This is a rare chance to help architect and lead the development of Sciforium’s next‑generation model serving platform—the high‑performance engine that will bring a multimodal, highly efficient foundation model to market. As a senior technical leader, you’ll not only build core components yourself but also guide and mentor other engineers, influencing engineering direction, standards, and execution quality. You will learn and shape the full AI stack: from GPU kernels and quantized execution paths to distributed serving, scheduling, and the APIs that power real‑time AI applications. If you enjoy deep systems work, thrive on ownership, and want to lead engineers in building foundational AI infrastructure, this role puts you at the center of Sciforium’s mission and growth. Key Responsibilities Lead the technical direction of the model serving platform, owning architecture decisions and guiding engineering execution. Build core serving components including execution runtimes, batching, scheduling, and distributed inference systems. Develop high‑performance C++ and CUDA/HIP modules, including custom GPU kernels and memory‑optimized runtimes. Collaborate with ML researchers to productionize new multimodal models and ensure low‑latency, scalable inference. Build Python APIs and services that expose model capabilities to downstream applications. Mentor and support other engineers through code reviews, design discussions, and hands‑on technical guidance. Drive performance profiling, benchmarking, and observability across the inference stack. Ensure high reliability and maintainability through testing, monitoring, and engineering best practices. Troubleshoot and resolve complex issues across GPU, runtime, and service layers. Must‑Haves Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience. 5+ years of experience designing and building scalable, reliable backend systems or distributed infrastructure. Strong understanding of LLM inference mechanics (prefill vs decode, batching, KV cache). Experience with Kubernetes/Ray, Containerization. Strong proficiency in C++, Python. Strong debugging, profiling, and performance optimization skills at the system level. Ability to collaborate closely with ML researchers and translate model or runtime requirements into production‑grade systems. Effective communication skills and the ability to lead technical discussions, mentor engineers, and drive engineering quality. Comfortable working from the office and contributing to a fast‑moving, high‑ownership team culture. Nice to Have Experience with ML systems engineering, distributed GPU scheduling, open source inference engine like vLLM, Sglang, or TRT‑LLM. Experience in building large‑scale ML/MLOps infrastructure. Proficiency in CUDA or ROCm and experience with GPU profiling tools. Experience at an AI/ML startup, research lab, or Big Tech infrastructure/ML team. Familiarity with multimodal model architectures, raw‑byte models, or efficient inference techniques. Contributions to open‑source ML or HPC infrastructure. Why Join Us Opportunity to build frontier‑scale AI infrastructure powering next‑generation LLMs and multimodal models. Work with top‑tier engineers and researchers across systems, GPUs, and ML frameworks. Tackle high‑impact performance and scalability challenges in training and inference. Access state‑of‑the‑art GPU clusters, datasets, and tooling. Opportunity to publish, patent, and push the boundaries of modern AI. Join a culture of innovation, ownership, and fast execution in a rapidly scaling AI organization. Benefits Include Medical, dental, and vision insurance. 401k plan. Daily lunch, snacks, and beverages. Flexible time off. Competitive salary and equity. Equal Opportunity Sciforium is an equal opportunity employer. xrczosw All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status. Seniority level Mid‑Senior level Employment type Full‑time Job function Engineering and Information Technology Industries Technology, Information and Internet
-
San Francisco, United States Sciforium Full timeLead Software Engineer, Model Serving Platform Join to apply for the Lead Software Engineer, Model Serving Platform role at Sciforium. Sciforium is an AI infrastructure company developing next‑generation multimodal AI models and a proprietary, high‑efficiency serving platform. Backed by multi‑million‑dollar funding and direct sponsorship from AMD...
-
San Francisco, CA, United States Sciforium Full timeLead Software Engineer, Model Serving Platform Join to apply for the Lead Software Engineer, Model Serving Platform role at Sciforium . Sciforium is an AI infrastructure company developing nextgeneration multimodal AI models and a proprietary, highefficiency serving platform. Backed by multimilliondollar funding and direct sponsorship from AMD with handson...
-
Lead AI Model Serving Platform Engineer
20 hours ago
San Francisco, CA, United States Sciforium Full timeA leading AI infrastructure company in California is looking for a Lead Software Engineer for their Model Serving Platform. This role involves architecting high-performance systems and mentoring engineers through best practices. Ideal candidates will have over 5 years of experience in backend systems, strong skills in C++ and Python, and a solid grasp of...
-
San Francisco, California, United States Sciforium Full timeSciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the full stack powering frontier AI models and real-time...
-
Lead AI Model Serving Platform Engineer
5 days ago
San Francisco, United States Sciforium Full timeA leading AI infrastructure company in California is looking for a Lead Software Engineer for their Model Serving Platform. This role involves architecting high-performance systems and mentoring engineers through best practices. Ideal candidates will have over 5 years of experience in backend systems, strong skills in C++ and Python, and a solid grasp of LLM...
-
Lead AI Model Serving Platform Engineer
3 days ago
San Francisco, CA, United States Sciforium Full timeA leading AI infrastructure company in California is looking for a Lead Software Engineer for their Model Serving Platform. This role involves architecting high-performance systems and mentoring engineers through best practices. Ideal candidates will have over 5 years of experience in backend systems, strong skills in C++ and Python, and a solid grasp of LLM...
-
Senior Software Engineer, Model Serving
3 weeks ago
San Francisco, United States Databricks Full timeSenior Software Engineer, Model Serving at Databricks Join to apply for the Senior Software Engineer, Model Serving role at Databricks. At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this...
-
Senior Software Engineer, Model Serving
2 weeks ago
San Francisco, CA, United States Databricks Full timeSenior Software Engineer, Model Serving at Databricks Join to apply for the Senior Software Engineer, Model Serving role at Databricks . At Databricks, we are passionate about enabling data teams to solve the world's toughest problems from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do...
-
Senior Software Engineer, Model Serving
3 weeks ago
San Francisco, CA, United States Databricks Full timeSenior Software Engineer, Model Serving at Databricks Join to apply for the Senior Software Engineer, Model Serving role at Databricks . At Databricks, we are passionate about enabling data teams to solve the world's toughest problems from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by...
-
Senior Software Engineer, Model Serving
4 weeks ago
San Francisco, CA, United States Menlo Ventures Full timeAt Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve...