Distributed AI Infrastructure Engineer

7 days ago


San Francisco, California, United States Together AI Full time

About Together AI

We are a research-driven artificial intelligence company. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models.

Our team has made significant contributions to open-source research, models, and datasets that advance the frontier of AI. We invite you to join our passionate group of researchers and engineers in building the next-generation AI infrastructure.

About the Role

This position involves designing and developing large-scale, fault-tolerant distributed machine learning systems that power our accelerated AI initiatives. You will collaborate with our AI researchers and infrastructure teams to ensure our systems are robust and efficient.

Responsibilities

  • Design and build large-scale, distributed machine learning systems that are fault-tolerant and high-performance.
  • Develop and optimize distributed processing frameworks and storage systems.
  • Collaborate with researchers, engineers, and product managers to integrate ML systems into our infrastructure.
  • Conduct architecture and design reviews to ensure best practices in system design.
  • Implement robust monitoring and logging systems to ensure the health and performance of our ML systems.


  • San Francisco, California, United States WaveForms AI Full time

    Job title: Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff Who We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive. Role...


  • San Francisco, California, United States Together AI Full time

    Company OverviewTogether AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama.Role SummaryWe are seeking a Distributed ML Systems Engineer to...


  • San Francisco, California, United States Waveforms AI, Inc Full time

    Job title:Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical StaffWho We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive.Role...


  • San Francisco, California, United States Naptha AI Full time

    Job Description Job Description Infrastructure Lead (Agent Networks)About this roleWe are seeking an exceptional Infrastructure Lead to architect and build the foundational systems that will power the next generation of AI agent networks at Naptha AI. This is a rare opportunity to shape the future of AI agent infrastructure at a massively ambitious scale,...


  • San Francisco, California, United States Together AI Full time

    As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...


  • San Mateo, California, United States Lumino Ai Full time

    About LuminoAt Lumino Ai, our mission is to harness the potential of AI for humanity. We're building infrastructure that empowers anyone to create AI models.About the Role:We're seeking an experienced Machine Learning Engineer to join our team and contribute to setting up the foundations of our company. As a key member, you'll be responsible for designing...


  • San Francisco, California, United States Together AI Full time

    About the Role As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth. This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and...


  • San Francisco, California, United States Together AI Full time

    As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...


  • San Francisco, California, United States Together AI Full time

    About the Role We are seeking a skilled Infrastructure Security Engineer to join our team and contribute to building open, transparent, and secure AI systems. As a crucial member of our security team, you will play a vital role in safeguarding our globally distributed systems and infrastructure. Responsibilities Collaborate with engineering and...


  • San Francisco, California, United States Together AI Full time

    RoleTogether AI is seeking a Distributed ML Systems Engineer to design and build scalable machine learning systems that power our accelerated AI initiatives. This role involves developing large-scale, fault-tolerant distributed systems that handle high-load and high-performance requirements. If you are passionate about designing ML systems that operate at...