AI Infrastructure Optimizer

2 weeks ago


San Francisco, California, United States Together AI Full time
Job Description:
As a key member of Together AI's hardware team, you will be responsible for optimizing and scaling our decentralized GPU resources. This critical role involves ensuring the efficient operation of thousands of GPUs distributed across multiple data centers. Your expertise will enable cutting-edge AI advancements that democratize access to AI technology globally.

The responsibilities of this position include:
  1. Monitoring and managing GPU hardware inventory across multiple decentralized data centers
  2. Developing and maintaining a system to log and track all GPU outages or malfunctions
  3. Generating reports on utilization, availability, and performance trends
  4. Continuously seeking opportunities to improve GPU tracking processes and systems

About Together AI:
Together AI is a research-driven artificial intelligence company dedicated to significantly lowering the cost of modern AI systems by co-designing software, hardware, algorithms, and models. Our team has contributed to leading open-source research, models, and datasets to advance the frontier of AI. We are committed to building the next generation of AI infrastructure with passion and innovation.

Requirements:
This position requires:
  • A bachelor's degree in business, information technology, or engineering-related fields
  • At least 3 years of experience in technical program management, inventory management, and/or data center operations/project management
  • Proficiency with inventory management and/or project management systems and tools
  • Experience with data analytics and report generation for performance monitoring
  • Strong communication skills for handling customer inquiries
  • Excellent problem-solving skills and ability to work in a fast-paced environment


  • San Francisco, California, United States WaveForms AI Full time

    Job title: Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff Who We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive. Role...


  • San Francisco, California, United States Waveforms AI, Inc Full time

    Job title:Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical StaffWho We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive.Role...


  • San Francisco, California, United States Together AI Full time

    At Together AI, we are pushing the boundaries of artificial intelligence by developing state-of-the-art infrastructure for efficient and scalable inference. Our mission is to optimize inference frameworks, algorithms, and infrastructure, ensuring high-performance AI deployment across a diverse range of applications.About the RoleWe are seeking an Inference...


  • San Francisco, California, United States Together AI Full time

    About Together AIWe are a research-driven artificial intelligence company. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models.Our team has made significant contributions to open-source research, models, and datasets that advance the frontier of AI. We invite you to join our...


  • San Francisco, California, United States The Rundown AI, Inc. Full time

    About the RoleThe Rundown AI, Inc. is seeking a highly skilled Machine Learning Systems Engineer to join its Model Evaluations team. As a member of this team, you will be responsible for designing, building, and maintaining scalable systems that enable researchers to effectively evaluate models and conduct inference tasks critical to the organization's...


  • San Francisco, California, United States The Rundown AI, Inc. Full time

    About the RoleThe Rundown AI, Inc. is seeking an AI Infrastructure Specialist to join our Data Encodings and Tokenization team. As a key member of our team, you'll play a crucial role in developing and optimizing the encodings and tokenization systems used throughout our Finetuning workflows.This position requires a strong understanding of machine learning...


  • San Francisco, California, United States The Rundown AI, Inc. Full time

    About The Rundown AI, Inc.Company OverviewThe Horizons team at The Rundown AI, Inc. leads the development of our company's reinforcement learning research and advancements in AI systems. We've made significant contributions to all Claude models, with substantial impacts on the autonomy and coding capabilities of Claude 3.5 and 3.7 Sonnet.About the RoleAs an...


  • San Francisco, California, United States Together AI Full time

    Company OverviewTowards a More Transparent AI FutureTogether AI is revolutionizing the field of artificial intelligence by co-designing software, hardware, algorithms, and models. Our mission is to significantly lower the cost of modern AI systems, making them more accessible to everyone. With contributions to leading open-source research, models, and...


  • San Francisco, California, United States Jobleads-US Full time

    At Jobleads-US, we're seeking a highly skilled Training Dataset and Checkpoint Acceleration Engineer to join our team of experts in developing AI infrastructure.We focus on creating scalable, efficient systems for handling massive datasets and managing large-scale distributed checkpoints. As a key member of our team, you'll work at the intersection of data...


  • San Francisco, California, United States Together AI Full time

    Company OverviewTogether AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama.We invite you to join a passionate group of researchers in our...


  • San Francisco, California, United States Deccan AI Full time

    We're seeking a highly skilled AI Performance Optimizer to join our team at Deccan AI. As one of the first salespeople on board, you'll play a crucial role in helping companies improve their AI model performance using high-quality data.Our startup is young and rapidly growing, with partnerships established with some of the biggest tech firms in the industry....

  • Program Manager

    1 week ago


    San Francisco, California, United States The Rundown AI, Inc. Full time

    About the RoleThe Rundown AI, Inc. is seeking a highly organized Program Manager - Infrastructure Optimization to join its Capacity Engineering & Efficiency team. This critical role involves leading efforts to develop self-service tools and dashboards to enable anthropic engineers to understand their capacity, efficiency, and costs.You will investigate...


  • San Mateo, California, United States Lumino Ai Full time

    About LuminoAt Lumino Ai, our mission is to harness the potential of AI for humanity. We're building infrastructure that empowers anyone to create AI models.About the Role:We're seeking an experienced Machine Learning Engineer to join our team and contribute to setting up the foundations of our company. As a key member, you'll be responsible for designing...


  • San Francisco, California, United States Distyl AI Full time

    **About Distyl AI**We develop AI native technologies for humans & AI to collaborate and power the operations of the Global Fortune 1000. Our platform, Distillery, along with our team of AI Engineers, Researchers, and Strategists, is pioneering AI-native systems of work.**Job Description**We're looking for an experienced AI Platform Engineer to design and...


  • San Francisco, California, United States Together AI Full time

    As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...


  • San Francisco, California, United States Together AI Full time

    About the Role As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth. This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and...


  • San Francisco, California, United States The Rundown AI, Inc. Full time

    Key ResponsibilitiesWe're seeking a talented engineer to join our team and take on the following projects:Design and implement high-performance data pipelines for processing large-scale code datasets with an emphasis on reliability and reproducibilityBuild and maintain secure sandboxed execution environments using virtualization technologies like GVisor and...

  • AI Researcher

    7 days ago


    San Francisco, California, United States WaveForms AI Full time

    Company Overview">WaveForms AI is a pioneering Audio Large Language Models (LLMs) company at the forefront of audio intelligence innovation. Our mission is to push the boundaries of multimodal AI systems, combining cutting-edge research and products to revolutionize the field.">Job Description">The Research Engineer – Pre-training & Post-training role is a...


  • San Francisco, California, United States Together AI Full time

    As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...


  • San Francisco, California, United States Source Technology Full time

    We are seeking a highly skilled AI Infrastructure Engineer to join our team on a contract basis. The ideal candidate will have experience in designing, deploying, and managing scalable infrastructure for AI and machine learning (ML) applications. This role will focus on optimizing workflows, ensuring system reliability, and enabling seamless integration of...