Current jobs related to Technical Program Manager, AI Infrastructure - San Francisco, California - Together AI


  • San Francisco, California, United States The Rundown AI, Inc. Full time

    About the Team The Infrastructure Foundations team delivers specialized infrastructure tailored explicitly to OpenAI's demanding research and product workloads. We build foundational infrastructure, from physical datacenter selection and hardware procurement to network design, driven by clear, intuitive understanding of current and future workloads.About...


  • San Francisco, California, United States WaveForms AI Full time

    Job title: Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff Who We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive. Role...


  • San Francisco, California, United States Together AI Full time

    Role: As TPM at Together AI, you will be at the core of building, optimizing, and scaling the global GPU resources needed for a pioneering AI infrastructure company. Your role is crucial in ensuring that the backbone of our AI models, thousands of GPUs distributed around the world, operates efficiently and reliably, enabling cutting-edge AI advancements that...


  • San Francisco, California, United States Together AI Full time

    Role: As TPM at Together AI, you will be at the core of building, optimizing, and scaling the global GPU resources needed for a pioneering AI infrastructure company. Your role is crucial in ensuring that the backbone of our AI models, thousands of GPUs distributed around the world, operates efficiently and reliably, enabling cutting-edge AI advancements that...


  • San Francisco, California, United States Together AI Full time

    Role: As TPM at Together AI, you will be at the core of building, optimizing, and scaling the global GPU resources needed for a pioneering AI infrastructure company. Your role is crucial in ensuring that the backbone of our AI models, thousands of GPUs distributed around the world, operates efficiently and reliably, enabling cutting-edge AI advancements that...


  • San Francisco, California, United States Together AI Full time

    Role: As TPM at Together AI, you will be at the core of building, optimizing, and scaling the global GPU resources needed for a pioneering AI infrastructure company. Your role is crucial in ensuring that the backbone of our AI models, thousands of GPUs distributed around the world, operates efficiently and reliably, enabling cutting-edge AI advancements that...


  • San Francisco, California, United States The Rundown AI, Inc. Full time

    About The TeamThe Infrastructure Foundations team delivers specialized infrastructure tailored explicitly to OpenAI's demanding research and product workloads.We build foundational infrastructure, from physical datacenter selection and hardware procurement to network design, driven by clear, intuitive understanding of current and future workloads.This role...


  • San Francisco, California, United States Waveforms AI, Inc Full time

    Job title:Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical StaffWho We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive.Role...


  • San Francisco, California, United States 11x AI Inc. Full time

    Empowering Human Potential with AIAt 11x AI Inc., we believe that humans should be free to focus on high-value tasks, leaving routine work to our autonomous digital workers. Our mission is to accelerate human progress by harnessing the power of artificial intelligence.We've made significant strides in achieving our goal, with a 20x increase in ARR and...


  • San Francisco, California, United States Snorkel AI Full time

    We're on a mission to make machine learning accessible to everyone. At Snoekl AI, we're building the definitive AI data development platform.The AI landscape has undergone significant changes over the years, but one thing remains constant: high-quality data is essential for achieving differentiation, high performance, and production-ready systems.Our...

Technical Program Manager, AI Infrastructure

2 weeks ago


San Francisco, California, United States Together AI Full time
Key Responsibilities:
• Manage GPU hardware inventory across multiple decentralized data centers
• Develop and maintain a system to log and track all GPU outages or malfunctions, including root cause analysis, downtime duration, and replacement cycles
• Generate reports on utilization, availability, and performance trends, and recommend improvements
• Work with engineering, customer success, and operations to resolve outages, documenting resolutions and lessons learned for continuous improvement

We are committed to creating a diverse and inclusive workplace that reflects the communities we serve. We welcome applications from candidates who share our values and are passionate about building the next generation of AI infrastructure. If you are a motivated and experienced professional with a passion for AI, we encourage you to apply for this exciting opportunity.