ML Infrastructure Engineer

3 days ago

City of Utica, United States Genesis Molecular AI Full time

About The Team We’re a tight-knit team of proven drug hunters, deep learning researchers, and software engineers united by a common mission — drive AI innovation in biochemistry, discovering and developing groundbreaking therapies for patients suffering from severe disorders. Genesis AI team is focused on developing foundation models for small molecule drug discovery by conducting fundamental research at the intersection of machine learning, physics, and computational chemistry, as well as engineering robust software systems that enable running large scale simulations and training generative and predictive AI models designed to learn from all kinds of molecular data, leveraging our cluster with 1000s of GPUs and 10,000s of CPUs. About The Role We’re seeking experienced ML infrastructure engineers to join the team and lead engineering efforts focused on driving forward our ML research agenda for generative modeling of molecular systems, which is instrumental to our mission. As an engineer at Genesis, you will lead rapid iteration on our AI platform and infrastructure, unlocking the next level of performance, efficiency, and scale that was not previously possible. You will build massively distributed training and inference pipelines, core MLOps tools and frameworks, and optimize GPU operations to speed up ML models. Genesis is a highly-collaborative and cross‑functional environment, and you will work in close partnership with our exceptional engineers, researchers, and scientists. You Will Lead engineering efforts focused on continuous improvement of the AI platform, focused on rapid build out and iteration on scalable and robust distributed infrastructure for ML training, inference, and evaluation. Support model training and deployment across multiple clusters and multiple clouds, optimizing for throughput and cost. Optimize efficiency of ML models and other workloads in terms of latency, throughput, memory consumption, etc. (e.g., via GPU performance engineering), pushing the limits of what’s possible with the current hardware. Contribute to the long‑term vision for Genesis’ ML platform. Have the opportunity to mentor and guide more junior members of our technical team as well as research interns, fostering an environment of growth and innovation. You are Strong engineer who constantly strives for technical excellence. You can write clean code and have a deep understanding of the codebases you work in. Deeply experienced with distributed training and inference of large models on GPU clusters and some of the core libraries and frameworks we use: Pytorch, Pytorch Lightning, Pytorch Geometric, and Ray. Independent thinker with a strong sense of ownership and capability of engineering robust systems from first‑principles‑based conceptualization to state‑of‑the‑art realization. Curious, problem‑oriented thinker who is excited to dive deep into the emerging field at the intersection of AI, physics, chemistry, and biology and make foundational contributions and discoveries (no previous experience in anything but ML necessary). Nice to haves Experienced with building, maintaining and debugging low‑level cluster infrastructure running on multiple clouds using Kubernetes and Terraform. Experienced GPU engineer who can quickly figure out performance bottlenecks and architect highly performant code for large scale ML workloads. Experience with XLA, Triton, CUDA, or similar accelerator programming languages and/or deep learning compiler stacks. Experience working with some of the following: molecular systems (protein sequences and 3D structures, small molecules, etc.), ML force fields or other physics‑informed models and methods, or point cloud data in other application domains, such as 3D graphics. Compensation, Benefits, And Perks Competitive compensation package that includes salary and equity. Comprehensive health benefits: Medical, Dental, and Vision (covered 100% for the employees). 401(k) plan. Open (unlimited) PTO policy. Free lunches and dinners at our offices. Paid family leave (maternity and paternity). Life and long‑ and short‑term disability insurance. About Genesis Molecular AI Genesis Molecular AI is pioneering foundation models for molecular AI to unlock a new era of drug design and development. The company’s generative and predictive AI platform, GEMS (Genesis Exploration of Molecular Space), integrates AI and physics into industry‑leading models to generate and optimize drug molecules, including the breakthrough generative diffusion model Pearl for structure prediction. Genesis has raised over $300 million from leading AI, tech and life science‑focused investors, signed multiple AI‑focused research collaborations with major pharma partners, and is deploying GEMS to advance an internal therapeutics pipeline for a variety of high‑impact targets. Genesis is headquartered in San Mateo, CA, with a fully integrated laboratory in San Diego. We are proud to be an inclusive workplace and an Equal Opportunity Employer. #J-18808-Ljbffr

Principal Software Engineer, ML Infrastructure

2 weeks ago

Foster City, CA, United States Zoox Full time

Zoox is on a mission to reimagine transportation and ground-up build autonomous robotaxis that are safe, reliable, clean, and enjoyable for everyone. We are still in the early stages of deploying our robotaxis, and it's a great time to join Zoox and make a significant impact on executing this mission. The ML Infrastructure team at Zoox plays a crucial role...
AI/ML Infrastructure Engineer

21 hours ago

Jefferson City, MO, United States Oracle Full time

Job Description We are at the forefront of developing cutting-edge AI solutions that push the boundaries of machine learning, LLM applications, and agentic AI. Our team builds real-world AI systems and deploys scalable, production-ready solutions across Oracle's enterprise customers. We are seeking a highly skilled AI/ML Infrastructure Engineer to design,...
Principal ML Infra Engineer: Scale AI for Drug Discovery

3 days ago

City of Utica, United States Genesis Molecular AI Full time

A biopharmaceutical AI innovation company based in New York is seeking an experienced ML Infrastructure Engineer to enhance their AI platform. This role involves optimizing GPU performance, supporting model training, and contributing to the overall strategic vision of the ML platform within a dynamic, collaborative environment. Candidates should have a...
Senior ML Engineer – ML/Inference

6 days ago

Town of Poland, United States MARA Full time

MARA is redefining the future of sovereign, energy-aware AI infrastructure. We’re building a modular platform that unifies IaaS, PaaS, and SaaS which will enable governments, enterprises, and AI innovators to deploy, scale, and govern workloads across data centers, edge environments, and sovereign clouds. MARA is seeking a Machine Learning Engineer to lead...
Senior ML Inference Engineer: Scalable AI Infrastructure

6 days ago

Town of Poland, United States MARA Full time

An innovative tech company located in New York is seeking a skilled Machine Learning Engineer to lead the development and optimization of AI models for their infrastructure. Ideal candidates will have extensive experience with inference optimization, model serving, and MLOps practices. Responsibilities include managing ML model lifecycles, designing scalable...
Senior AI Infrastructure Lead

1 day ago

Town of Florida, United States Sphere Full time

A global logistics technology partner is looking for an AI Infrastructure Engineer to build and maintain scalable AI infrastructure. You will enable teams to run ML experiments, deploy ML models, and implement MLOps pipelines. The role includes designing distributed training pipelines, optimizing cloud resources, and monitoring model performance. Candidates...
Senior ML Infrastructure Engineer: Scale GPU Training

3 days ago

Redwood City, United States Dyna Robotics Full time

A cutting-edge robotics company based in California is looking for an experienced Machine Learning Infrastructure Engineer. This role involves designing scalable ML training platforms, optimizing high-performance computing systems, and ensuring robust job scheduling and reliability. Ideal candidates will have 7+ years in software with hands-on experience in...
Senior ML Infrastructure Engineer: Scale GPU Training

2 weeks ago

Redwood City, United States Dyna Robotics Full time

A cutting-edge robotics company based in California is looking for an experienced Machine Learning Infrastructure Engineer. This role involves designing scalable ML training platforms, optimizing high-performance computing systems, and ensuring robust job scheduling and reliability. Ideal candidates will have 7+ years in software with hands-on experience in...
Lead AI Cloud Infrastructure Engineer — Scale ML Platforms

6 days ago

Jersey City, United States JPMorgan Chase & Co. Full time

A leading financial institution seeks a Lead Software Engineer for AI Cloud Infrastructure in Jersey City, NJ. The role requires 5+ years of experience in software engineering and hands-on skills in Python and automation. Responsibilities include architecting scalable AI solutions, creating deployment tools for ML models, and mentoring junior engineers. A...
Hybrid ML Infrastructure Engineer for Computer Vision

4 weeks ago

Redwood City, CA, United States The Mice Groups, Inc. Full time

Overview Make your application after reading the following skill and qualification requirements for this position. Job Title: Machine Learning Infrastructure Engineer – Computer Vision & AI Employer: The Mice Groups, Inc. Location Redwood City, CA (Hybrid) Employment type Contract-to-Hire or Full-Time Compensation Base pay range : $150,000.00/yr -...

Americas

Europe

Asia / Oceania

Africa

ML Infrastructure Engineer