Machine Learning Infrastructure Specialist

20 hours ago


San Francisco, California, United States Recruiting from Scratch Full time
Machine Learning Infrastructure Specialist

We are scaling our inference systems to handle millions of LLM requests daily, requiring exceptional talent to drive growth.

This role involves designing and implementing large-scale, fault-tolerant systems for AI infrastructure. Key responsibilities include:

  • Architecting distributed systems for our inference network.
  • Developing resource allocation models across heterogeneous hardware.
  • Optimizing network performance metrics (latency, throughput, availability).
  • Building robust monitoring and observability systems.

The ideal candidate will have 5+ years of experience building high-performance, scalable distributed systems, as well as strong programming skills in TypeScript, Python, and either Go, Rust, or C++.

Experience with Kubernetes/Nomad orchestration, AI tooling (ChatGPT, Claude, Cursor), and GPU programming (CUDA) is a plus. Startup experience (pre-seed to series A) is also required.

The salary range is $180K - $300K + Equity (0.1-3%) | Visa Sponsorship Available, with the location being San Francisco, CA.



  • San Francisco, California, United States Unreal Gigs Full time

    Machine Learning Infrastructure Specialist Wanted at Unreal GigsWe are looking for a skilled Machine Learning Infrastructure Specialist to join our team at Unreal Gigs. As an expert in designing and implementing scalable AI systems, you will play a critical role in driving business innovation.About the RoleThe successful candidate will have a strong...


  • San Francisco, California, United States ZipRecruiter Full time

    **About Us**Welcome to ZipRecruiter, where we're pioneering AI-driven innovation. We're committed to building robust infrastructure that powers our machine learning models at scale. As a Senior Machine Learning Infrastructure Engineer, you'll lead the charge, designing, developing, and optimizing our machine learning infrastructure.**What You'll Do:**Design...


  • San Jose, California, United States Adobe Full time

    We are seeking an experienced Machine Learning Infrastructure Specialist to join our team at Adobe. In this role, you will design, develop, and maintain robust AI/ML infrastructure solutions to support the training and deployment of large-scale AI models.ResponsibilitiesKey responsibilities include:Developing high-quality, product-level code that is easy to...


  • San Francisco, California, United States Unreal Gigs Full time

    Company Overview: At Unreal Gigs, we're driving the future of AI innovation by building cutting-edge machine learning infrastructure. Our team is dedicated to developing robust and scalable systems that power our models at scale.Position Overview: As a Senior Machine Learning Infrastructure Engineer, you'll lead the design and development of our machine...


  • San Francisco, California, United States Unreal Gigs Full time

    Unreal Gigs OverviewWelcome to Unreal Gigs, a pioneering force in AI-driven innovation. We're committed to building robust infrastructure that powers our machine learning models at scale.Salary: $195,000 - $255,000 per yearPosition SummaryWe're seeking a seasoned Senior Machine Learning Infrastructure Engineer to lead the design, development, and...


  • San Francisco, California, United States Unreal Gigs Full time

    At Unreal Gigs, we're on the cutting-edge of AI-driven innovation. As a Senior Machine Learning Infrastructure Engineer, you'll lead the design, development, and optimization of our machine learning infrastructure.About the RoleYou'll work on challenging projects, from building scalable data pipelines to deploying and managing machine learning models in...


  • San Francisco, California, United States ZipRecruiter Full time

    Job Title: Cloud Engineering Manager - AI">We are seeking a seasoned Cloud Engineering Manager with expertise in Artificial Intelligence and Machine Learning to lead our cloud infrastructure initiatives. As a Cloud Engineering Manager, you will oversee the design, development, and optimization of our cloud-based infrastructure solutions to support machine...


  • San Francisco, California, United States Unreal Gigs Full time

    Unreal GigsWe are looking for a highly skilled Machine Learning Infrastructure Architect to lead our MLOps strategy and build the backbone of our AI operations.About the Role:Job Description:As a Machine Learning Infrastructure Architect, you will be responsible for designing and implementing scalable, secure, and efficient MLOps infrastructure that...


  • San Francisco, California, United States Unreal Gigs Full time

    About the RoleUnreal Gigs is a trailblazer in AI-driven innovation, and we're seeking a seasoned leader to drive our machine learning infrastructure initiatives.Key ResponsibilitiesTechnical Leadership: Provide strategic guidance, mentorship, and technical leadership to a team of machine learning infrastructure engineers, fostering a culture of excellence,...


  • San Francisco, California, United States Flip Full time

    About Flip.shopWelcome to Flip.shop, where innovation meets the social commerce revolution. Our Series C funding round has propelled our valuation to an impressive $1.05 billion, and we're redefining the shopping experience by giving consumers a voice in a space dominated by tech giants.Opportunities at Flip.shopThis isn't just a job—it's a chance to build...


  • San Francisco, California, United States Unreal Gigs Full time

    Job OverviewCompany Background: Welcome to Unreal Gigs, a leading innovator in machine learning infrastructure. Our mission is to empower data scientists and engineers with cutting-edge technology and expertise.Position Summary: As a Machine Learning Infrastructure Solutions Architect, you will play a critical role in designing and optimizing our machine...


  • San Francisco, California, United States Anyscale Full time

    About AnyscaleWe're a leading provider of distributed computing solutions, dedicated to empowering software developers with accessible and scalable tools.Our mission is to democratize distributed computing and make it accessible to developers of all skill levels. We're commercializing Ray, a popular open-source project that's creating an ecosystem of...


  • San Francisco, California, United States ZipRecruiter Full time

    About the RoleWe are seeking a highly skilled Machine Learning Operations Specialist to join our team at ZipRecruiter. As an MLOps specialist, you will be responsible for designing, automating, and managing robust machine learning pipelines that power AI-driven products.With a strong background in DevOps and cloud infrastructure management, you will work...


  • San Francisco, California, United States Anyscale Full time

    About the JobWe are seeking a highly skilled engineer to join our distributed training team. As a Machine Learning Infrastructure Engineer at Anyscale, you will play a key role in shaping the future of ML training infrastructure. You will work closely with our team to develop and maintain widely adopted open-source machine learning libraries, including Ray...


  • San Francisco, California, United States OpenAI Full time

    We are seeking a visionary Machine Learning Infrastructure Architect to join our team at OpenAI in San Francisco, CA. This role involves designing and maintaining robust and secure systems that power the training and advanced use cases of next-gen AI models.You will work closely with researchers to enhance system capabilities and support experimental and...


  • San Diego, California, United States Apixio Full time

    About the RoleThe Senior MLOps Engineer will play a critical role in operationalizing and automating machine learning workflows, ensuring scalability, reliability, and efficiency. As part of our team, you will collaborate closely with data scientists, software engineers, and DevOps teams to deploy, monitor, and manage machine learning models in production...


  • South San Francisco, California, United States Genentech Full time

    About the RoleWe're looking for a Machine Learning Infrastructure Lead to join our team at Genentech Computational Sciences. As a key member of our Prescient Design group, you'll play a leading role in developing and maintaining large-scale machine learning models and infrastructure.About the ResponsibilitiesThis role involves:Contributing to cutting-edge...


  • San Francisco, California, United States ZipRecruiter Full time

    Job OverviewA highly skilled and experienced Chief Machine Learning Infrastructure Architect is sought after to lead our MLOps efforts, focusing on designing and implementing scalable infrastructure for deploying, monitoring, and managing machine learning models at scale. This role requires a deep understanding of machine learning concepts, strong technical...


  • San Francisco, California, United States Unreal Gigs Full time

    Company Overview: At Unreal Gigs, we're at the forefront of AI-driven innovation. We're committed to building robust infrastructure that powers our machine learning models at scale.


  • San Francisco, California, United States Sentry Full time

    About the RoleAs a Senior Machine Learning Systems Engineer at Sentry, you will play a pivotal role in shaping the company's AI/ML landscape. Your primary responsibility will be to design and build the core infrastructure required for developing, evaluating, deploying, and iterating on models and pipelines at scale.This position is crucial as it involves...