Current jobs related to AI Infrastructure Specialist - San Francisco, California - Together AI


  • San Francisco, California, United States The Rundown AI, Inc. Full time

    About the RoleThe Rundown AI, Inc. is seeking an AI Infrastructure Specialist to join our Data Encodings and Tokenization team. As a key member of our team, you'll play a crucial role in developing and optimizing the encodings and tokenization systems used throughout our Finetuning workflows.This position requires a strong understanding of machine learning...


  • San Francisco, California, United States Together AI Full time

    Company OverviewTogether AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama.We invite you to join a passionate group of researchers in our...


  • San Francisco, California, United States Distyl AI Full time

    **About Distyl AI**We develop AI native technologies for humans & AI to collaborate and power the operations of the Global Fortune 1000. Our platform, Distillery, along with our team of AI Engineers, Researchers, and Strategists, is pioneering AI-native systems of work.**Job Description**We're looking for an experienced AI Platform Engineer to design and...


  • San Francisco, California, United States WaveForms AI Full time

    Job title: Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff Who We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive. Role...


  • San Francisco, California, United States The Rundown AI, Inc. Full time

    About the RoleThe Rundown AI, Inc. is seeking a highly skilled Machine Learning Systems Engineer to join its Model Evaluations team. As a member of this team, you will be responsible for designing, building, and maintaining scalable systems that enable researchers to effectively evaluate models and conduct inference tasks critical to the organization's...


  • San Francisco, California, United States Together AI Full time

    About Together AIWe are a research-driven artificial intelligence company. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models.Our team has made significant contributions to open-source research, models, and datasets that advance the frontier of AI. We invite you to join our...


  • San Francisco, California, United States Replicate, Inc. Full time

    We are seeking an experienced AI Infrastructure Specialist to join our team. As a specialist in this area, you will be responsible for designing and implementing our AI infrastructure products. This includes working closely with our AI engineers to understand their needs and preferences, and using this information to inform product decisions.You will be...


  • San Francisco, California, United States The Rundown AI, Inc. Full time

    About The Rundown AI, Inc.Company OverviewThe Horizons team at The Rundown AI, Inc. leads the development of our company's reinforcement learning research and advancements in AI systems. We've made significant contributions to all Claude models, with substantial impacts on the autonomy and coding capabilities of Claude 3.5 and 3.7 Sonnet.About the RoleAs an...


  • San Francisco, California, United States Waveforms AI, Inc Full time

    Job title:Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical StaffWho We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive.Role...


  • San Francisco, California, United States Altana AI Full time

    About UsAltana AI is backed by leading investors and used by the world's most important organizations. Our customers connect to the Altana network to build resilience for critical industries and infrastructure, automate and safeguard cross-border trade, transform insurance underwriting, protect national security, combat modern slave labor, disrupt fentanyl...


  • San Francisco, California, United States Together AI Full time

    Job Description:As a key member of Together AI's hardware team, you will be responsible for optimizing and scaling our decentralized GPU resources. This critical role involves ensuring the efficient operation of thousands of GPUs distributed across multiple data centers. Your expertise will enable cutting-edge AI advancements that democratize access to AI...


  • San Francisco, California, United States Together AI Full time

    As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...


  • San Francisco, California, United States Together AI Full time

    About the Role As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth. This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and...


  • San Francisco, California, United States Together AI Full time

    At Together AI, we are pushing the boundaries of artificial intelligence by developing state-of-the-art infrastructure for efficient and scalable inference. Our mission is to optimize inference frameworks, algorithms, and infrastructure, ensuring high-performance AI deployment across a diverse range of applications.About the RoleWe are seeking an Inference...


  • San Francisco, California, United States Together AI Full time

    Company OverviewTowards a More Transparent AI FutureTogether AI is revolutionizing the field of artificial intelligence by co-designing software, hardware, algorithms, and models. Our mission is to significantly lower the cost of modern AI systems, making them more accessible to everyone. With contributions to leading open-source research, models, and...


  • San Francisco, California, United States Together AI Full time

    As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...


  • San Francisco, California, United States Cisco Systems Full time

    Cisco Systems is a leader in AI security, and we're seeking a talented Cloud AI Infrastructure Specialist to join our AI Defense team. As a key member of our ML Team, you'll design the data platform driving cutting-edge ML algorithms that detect and protect against AI security risks.About Our TeamWe are a team of passionate individuals who are dedicated to...


  • San Francisco, California, United States Snorkel AI Full time

    We're on a mission to make machine learning accessible to everyone. At Snoekl AI, we're building the definitive AI data development platform.The AI landscape has undergone significant changes over the years, but one thing remains constant: high-quality data is essential for achieving differentiation, high performance, and production-ready systems.Our...


  • San Francisco, California, United States Together AI Full time

    About Together AI:We are a research-driven artificial intelligence company committed to developing open and transparent AI systems. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models.Our team has contributed to leading open-source research, models, and datasets advancing the frontier...


  • San Francisco, California, United States Coastal Carbon Full time

    About the RoleWe are seeking a highly skilled AI Infrastructure Specialist to join our team at Coastal Carbon. As an MLOps Engineer, you will be responsible for designing, developing, and maintaining large-scale machine learning infrastructure.Key Responsibilities:Design and implement scalable pipelines for efficient model training and inferenceDevelop...

AI Infrastructure Specialist

3 weeks ago


San Francisco, California, United States Together AI Full time
About the Role

As a key member of our team, you will be responsible for building, optimizing, and scaling global GPU resources for a pioneering AI infrastructure company. Your role is crucial in ensuring that thousands of GPUs distributed worldwide operate efficiently and reliably, enabling cutting-edge AI advancements that democratize access to AI technology globally.

Key Responsibilities
  • Develop and execute strategic plans for large-scale GPU cluster deployments, ensuring timely delivery within budget and quality standards.
  • Coordinate with external data center providers and hardware vendors on timelines, and ensure seamless integration with internal teams.
  • Identify and mitigate risks, and develop contingency plans to ensure business continuity.
  • Communicate project progress, status, and plans to internal stakeholders and customer groups, ensuring transparency and alignment across the organization.
Requirements
  • 7+ years of experience in technical program or project management, with a focus on large-scale technology deployments.
  • A technical background, with demonstrated ability to engage on technical topics, typically demonstrated by an Engineering degree or equivalent technical experience.
  • Experience with cloud computing platforms, decentralized cloud infrastructure, and/or similar large-scale technology deployments.
  • Experience with cloud-based technologies, such as AWS, Google Cloud, or Azure, and distributed systems, including containerization and orchestration tools.
  • Knowledge of data center operations, including power, cooling, and networking systems.
  • Excellent communication and project management skills, with the ability to work effectively with external vendors and internal stakeholders.
  • Strong process management skills, with experience in developing and implementing processes to ensure efficient and effective deployment of large-scale infrastructure.