ML Infrastructure Engineer

1 week ago

California, United States Jobgether Full time

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a ML Infrastructure Engineer (Staff / Principal) in California (USA). This role offers the opportunity to lead the development and optimization of cutting‑edge ML infrastructure for large‑scale generative and predictive AI models. You will work at the intersection of machine learning, physics, and computational chemistry, driving scalable, high‑performance systems that accelerate AI research in molecular modeling. The position involves designing distributed training pipelines, optimizing GPU operations, and building robust MLOps frameworks that push the boundaries of AI performance. You will collaborate closely with researchers, engineers, and scientists, mentoring junior team members while contributing to long‑term technical strategy. This is a hands‑on, high‑impact role where your work directly enables groundbreaking discoveries in molecular AI. Accountabilities Lead engineering efforts for building and scaling distributed ML training and inference infrastructure across GPU clusters and cloud environments. Optimize model efficiency in terms of throughput, latency, memory, and GPU utilization, pushing hardware to its performance limits. Design and implement MLOps tools and frameworks for automated, reliable deployment and evaluation of AI models. Collaborate with researchers and cross‑functional teams to integrate infrastructure with generative and predictive AI workflows. Drive long‑term platform vision, contributing to architectural decisions, tooling improvements, and best practices. Mentor junior engineers and research interns, fostering a culture of technical excellence and innovation. Requirements Extensive experience in distributed ML training and inference on large‑scale GPU clusters. Proficiency in PyTorch, PyTorch Lightning, PyTorch Geometric, Ray, or similar frameworks. Strong engineering skills with the ability to design, implement, and maintain robust, scalable systems. Experience optimizing GPU workloads and performance engineering for high‑throughput ML pipelines. Independent thinker with a strong sense of ownership and ability to deliver from first principles to production‑quality systems. Curiosity and problem‑solving mindset for working at the intersection of AI, physics, chemistry, and biology. Nice to Have Experience building and maintaining cluster infrastructure with Kubernetes and Terraform. Expertise in GPU programming, XLA, Triton, CUDA, or deep learning compiler stacks. Familiarity with molecular systems (proteins, small molecules, 3D structures), ML force fields, or point cloud data. Experience contributing to highly collaborative, cross‑functional teams in research or production ML environments. Benefits Competitive salary and equity package. Comprehensive health benefits: medical, dental, and vision fully covered for employees. 401(k) plan. Open (unlimited) PTO policy and paid family leave (maternity and paternity). Life, long‑term, and short‑term disability insurance. Free meals at office locations and other employee perks. Opportunities for growth, mentorship, and hands‑on impact in cutting‑edge molecular AI research. Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI‑driven job matching. When you apply, your profile goes through our AI‑powered screening process designed to identify top talent efficiently and fairly. 🔍 Our AI evaluates your CV and LinkedIn profile thoroughly, analyzing your skills, experience, and achievements. 📊 It compares your profile to the job’s core requirements and past success factors to determine your match score. 🎯 Based on this analysis, we automatically shortlist the three candidates with the highest match to the role. 🧠 When necessary, our human team may perform an additional manual review to ensure no strong profile is missed. The process is transparent, skills‑based, and free of bias — focusing solely on your fit for the role. Once the shortlist is completed, we share it directly with the company that owns the job opening. The final decision and next steps (such as interviews or additional assessments) are then made by their internal hiring team. Thank you for your interest #LI-CL1 #J-18808-Ljbffr

Senior AI/ML Data Engineer

2 weeks ago

California, United States Simarn Solutions Full time

Job Title: Senior AI/ML Data Engineer Location: California (Onsite) Job Type: C2C Position Overview We are seeking a highly skilled Senior AI/ML Data Engineer with expertise in multimodal AI models, vector databases, and Azure-based data engineering solutions. The candidate will lead design and implementation of scalable AI-driven data platforms,...
Tech Lead Manager, ML Training Infrastructure

3 days ago

Mountain View, California (HQ), United States Nuro Full time $235,030 - $352,290 per year

Who We Are Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its core technology, the Nuro Driver, to support a wide range of applications, from robotaxis and commercial fleets...
MTS, Data Infrastructure Engineer

4 weeks ago

California, United States Delphina Full time

About Delphina Todays Data Scientists are in pain - spending their time manually wrangling data, building models through slow trial and error, taking on painstaking rewrites for deployment, and dealing with countless other frustrating bottlenecks. The tools they are using for much of this work e.g. Jupyter notebooks and Pandas are over a decade old. We...
MTS, Data Infrastructure Engineer

4 weeks ago

California, United States Delphina Full time

About Delphina Todays Data Scientists are in pain - spending their time manually wrangling data, building models through slow trial and error, taking on painstaking rewrites for deployment, and dealing with countless other frustrating bottlenecks. The tools they are using for much of this work e.g. Jupyter notebooks and Pandas are over a decade old. We...
Sr. Staff Engineer, Hardware Infrastructure Lead

2 weeks ago

California, United States Tenstorrent Full time

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high...
Founding Engineer

4 weeks ago

California, United States Godela Full time

Overview At Godela, we're building the first Physics Foundation Model; an AI system that learns from simulation, experiment, and equations to instantly predict and simulate physical behavior. Our mission is to give every engineer an R&D lab at their fingertips, cutting months of simulation and experimentation into minutes. We are looking for people who get...
Founding Senior Software Engineer

2 weeks ago

California, United States Giga ML Full time

Location: San FranciscoExperience: 5-8 yearsSalary: $200K - $300K (Base) and 0.5%+ in EquityAbout GigaMLAt GigaML, we’re revolutionizing enterprise customer support by deploying AI agents that resolve over 1 million customer tickets monthly via voice and chat. Industry leaders like Postman and Zepto (YC’s fastest-growing company) trust our AI to navigate...
AI/ML Engineer

4 weeks ago

California, United States Rulebase Full time

Why you should join us We are building an autonomous factory for financial service agents, with real-time feedback loops from some of the largest financial services companies in the world, our customers. We work with structured and unstructured data at scale including calls, chats, emails, and AI agent responses to build autonomous systems that adapt,...
Founding Senior Software Engineer

6 days ago

California, United States Giga ML Full time

Location: San Francisco Experience: 5-8 years Salary: $200K - $300K (Base) and 0.5%+ in Equity About GigaML At GigaML, were revolutionizing enterprise customer support by deploying AI agents that resolve over 1 million customer tickets monthly via voice and chat. Industry leaders like Postman and Zepto (YCs fastest-growing company) trust our AI to navigate...
Staff Infrastructure Engineer

4 weeks ago

California, United States Salient Full time

We are hiring a Staff Infrastructure Engineer to design, build, and operate scalable, production-grade infrastructure from the ground up for enterprise, big bank clients with critical data. You'll own and develop our deployment pipelines, observability systems, and cloud infrastructure as we transition to a Kubernetes-based architecture. This is an onsite...

Americas

Europe

Asia / Oceania

Africa

ML Infrastructure Engineer