Software Engineer, ML Infrastructure

4 weeks ago


San Francisco, United States Scale AI, Inc. Full time

As a software engineer on the ML Infrastructure team, you will work on developing the platform for orchestrating post-training and model evaluation jobs. At Scale, we are constantly developing new data sources and running experiments to understand their impact on ML models. To support this effort, we are looking for engineers who are comfortable navigating cloud infrastructure challenges as well as research challenges in benchmarking and tuning LLMs.


The ideal candidate is someone who has strong fundamentals in machine learning, backend system design, and has prior ML Infrastructure experience. They should also be comfortable with infrastructure and large scale system design, as well as diagnosing both model performance and system failures.


You will:
  • Develop re-usable platforms for running in-house and open-source LLM-benchmarks.
  • Ensure correctness and performance of post-training and eval jobs on the platform.
  • Improve APIs for managing ML workflows.
  • Contribute to foundational infrastructure at the company for model inference and training.
  • Participate in our team's on call process to ensure the availability of our services.
  • Own projects end-to-end, from requirements, scoping, design, to implementation, in a highly collaborative and cross-functional environment.

Ideally you'd have:
  • 4+ years of experience developing ML platforms.
  • Passion for working closely with researchers to drive business impact.
  • Experience training and/or benchmarking LLMs.
  • Experience with Python, Docker, Kubernetes, and Infrastructure as code (e.g. terraform).

Nice to haves:
  • Experience building, deploying, and monitoring complex microservice architectures.
  • Experience working with a cloud technology stack (e.g. AWS or GCP).
#J-18808-Ljbffr

  • San Francisco, California, United States University of California - San Francisco Campus and Health Full time

    Job SummaryThe senior software engineer will lead the development, implementation, and maintenance of computing and data infrastructure to support the deployment and monitoring of Machine Learning (ML) and generative Artificial Intelligence (AI) tools at UCSF Health.This includes leading the Health IT Platform for Advanced Computing (HIPAC), a cloud...


  • San Francisco, California, United States United Software Group Full time

    Job Title: Data Engineering Lead - AI/ML OperationsWe are seeking a highly skilled Data Engineering Lead with expertise in AI/ML operations to join our team at United Software Group. This role will be responsible for overseeing the development and implementation of data engineering solutions that enable the company's AI/ML initiatives.About the Role:The...


  • San Francisco, United States Abridge AI Inc. Full time

    Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients.Our enterprise-grade technology transforms patient-clinician conversations into...


  • San Francisco, United States Recruiting from Scratch Full time

    Who is Recruiting from Scratch : Recruiting from Scratch is a talent firm that focuses on placing the best candidate for our clients. Our team is 100% remote and we work with teams across North America, South America, and Europe to help them hire. Senior ML Infrastructure Engineer | AI Infrastructure Scale-Up | SF Based Base: $180K - $300K + Equity (0.1-3%)...


  • San Francisco, California, United States Figma Full time

    About the RoleFigma is a design tool that empowers teams to create and collaborate on designs. We are seeking an experienced Senior Software Engineering Infrastructure Lead to join our team.Job DescriptionLead a team of software engineers in designing and building scalable services to power Figma's infrastructure.Design infrastructure to train, deploy, and...


  • San Francisco, California, United States Unity Technologies Full time

    About the RoleWe're seeking a skilled Senior Data and ML Infrastructure Engineer to join our team at Unity. As a key member of our Data & ML Platform team, you will design and optimize large-scale data platforms and machine learning infrastructure systems for efficiency, reliability, and cost-effectiveness.Key Responsibilities:Design and optimize large-scale...


  • San Francisco, United States ZipRecruiter Full time

    Job Description Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients. Our enterprise-grade technology transforms patient-clinician...


  • San Francisco, United States Recruiting from Scratch Full time

    Who is Recruiting from Scratch: Recruiting from Scratch is a talent firm that focuses on placing the best candidate for our clients. Our team is 100% remote and we work with teams across North America, South America, and Europe to help them hire. Senior ML Infrastructure Engineer | AI Infrastructure Scale-Up | SF Based Base: $180K - $300K + Equity (0.1-3%)...


  • San Francisco, United States Relyance AI Full time

    As Relyance AI's Senior Software Engineer, ML, you will strategize, drive, and execute on the initiatives in NLP for information extraction from legal documents, ML/NLP for information extraction from code and general ML in code analysis, as well as overall AI backend initiatives. You will partner with cross-functional stakeholders to design and build...


  • San Francisco, United States Relyance AI Full time

    As Relyance AI's Senior Software Engineer, ML, you will strategize, drive, and execute on the initiatives in NLP for information extraction from legal documents, ML/NLP for information extraction from code and general ML in code analysis, as well as overall AI backend initiatives. You will partner with cross-functional stakeholders to design and build...


  • San Francisco, United States Relyance AI Full time

    As Relyance AI's Senior Software Engineer, ML, you will strategize, drive, and execute on the initiatives in NLP for information extraction from legal documents, ML/NLP for information extraction from code and general ML in code analysis, as well as overall AI backend initiatives. You will partner with cross-functional stakeholders to design and build...


  • San Francisco, United States Baseten Full time

    ABOUT BASETEN We’re a growing team of builders backed by top-tier investors, including IVP , Spark Capital , Greylock , and Sarah Guo at Conviction . ML teams at enterprises and category-defining AI-native companies like Descript , Bland.ai , Patreon , Writer , and Robust Intelligence use Baseten to power their core production workloads with best-in-class...


  • San Francisco, United States Abridge AI Inc. Full time

    Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients.Our enterprise-grade technology transforms patient-clinician conversations into...


  • San Francisco, United States Recruiting From Scratch Full time

    Who is Recruiting from Scratch: Recruiting from Scratch is a talent firm that focuses on placing the best candidate for our clients. Our team is 100% remote and we work with teams across North America, South America, and Europe to help them hire. Senior ML Infrastructure Engineer | AI Infrastructure Scale-Up | SF Based Base: $180K - $300K + Equity (0.1-3%)...


  • San Francisco, United States ZipRecruiter Full time

    Job DescriptionAbridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients.Our enterprise-grade technology transforms patient-clinician...

  • AI / ML Engineer

    1 month ago


    San Francisco, United States Seven Seven Software Full time

    AI / ML (Artificial Intelligence , Machine Learning) Engineer 1. Experience in engineering and deploying Generative AI models, specifically focusing on Retrieval-Augmented Generation (RAG) systems and multi-agent workflows. 2. Strong software engineering foundation in developing and implementing state-of-the-art generative techniques and designing advanced...


  • San Francisco, United States CentML Full time

    About UsWe believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential.Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at...


  • san francisco, United States Acceler8 Talent Full time

    Senior Software Engineer (AI Infrastructure / MLOps) Introduction: We are seeking a Senior Software Engineer (AI Infrastructure / MLOps) to join our team. This role offers a unique opportunity to work on cutting-edge MLOps technologies and develop large-scale web applications for data-centric AI.About the Company: Our team comprises MIT PhDs who have worked...


  • san francisco, United States Acceler8 Talent Full time

    Senior Software Engineer (AI Infrastructure / MLOps) Introduction: We are seeking a Senior Software Engineer (AI Infrastructure / MLOps) to join our team. This role offers a unique opportunity to work on cutting-edge MLOps technologies and develop large-scale web applications for data-centric AI.About the Company: Our team comprises MIT PhDs who have worked...


  • San Francisco, United States Acceler8 Talent Full time

    Senior Software Engineer (AI Infrastructure / MLOps) Introduction: We are seeking a Senior Software Engineer (AI Infrastructure / MLOps) to join our team. This role offers a unique opportunity to work on cutting-edge MLOps technologies and develop large-scale web applications for data-centric AI.About the Company: Our team comprises MIT PhDs who have worked...