AI Infrastructure Manager
2 weeks ago
Towards a More Transparent AI Future
Together AI is revolutionizing the field of artificial intelligence by co-designing software, hardware, algorithms, and models. Our mission is to significantly lower the cost of modern AI systems, making them more accessible to everyone. With contributions to leading open-source research, models, and datasets, we are advancing the frontier of AI.
Salary and Benefits
We offer competitive compensation and equity packages that reflect your experience, skills, and job-related knowledge. Enjoy comprehensive health insurance, flexible remote work options, and other benefits that support your well-being and career growth.
About the Role
As our first Project Manager for hardware, you will be instrumental in optimizing and scaling our decentralized GPU resources. Your expertise will ensure the efficient operation of thousands of GPUs distributed across multiple data centers, enabling cutting-edge AI advancements that democratize access to AI technology globally. You'll collaborate with top engineers and innovators to power the next generation of AI-driven solutions.
Key Responsibilities
- Monitor and manage GPU hardware inventory across multiple decentralized data centers.
- Track the lifecycle of GPUs, including acquisition, deployment, usage, maintenance, and decommissioning.
- Develop and maintain a system to log and track all GPU outages or malfunctions.
- Work with engineering, customer success, and operations to resolve outages and document resolutions.
Requirements
This role requires a strong background in technical program management, inventory management, and/or data center operations. Proficiency with inventory management and/or project management systems and tools is essential. Experience with data analytics and report generation for performance monitoring is also required. Excellent communication and problem-solving skills are necessary for success in this position.
Nice to Have
A background in cloud computing platforms or decentralized cloud infrastructure would be beneficial. Certifications in inventory management or data center operations could also be advantageous. Experience tracking and managing the lifecycle of GPUs or similar hardware is highly desirable.
-
Strategic Product Manager
6 days ago
San Francisco, California, United States Snorkel AI Full timeWe're on a mission to make machine learning accessible to everyone. At Snoekl AI, we're building the definitive AI data development platform.The AI landscape has undergone significant changes over the years, but one thing remains constant: high-quality data is essential for achieving differentiation, high performance, and production-ready systems.Our...
-
Software Engineer, AI Infrastructure
6 days ago
San Francisco, California, United States WaveForms AI Full timeJob title: Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff Who We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive. Role...
-
Project Manager for AI Infrastructure
2 weeks ago
San Francisco, California, United States Together AI Full timeAbout the Job: We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure. You will be responsible for managing GPU hardware inventory, developing and maintaining a system to log and track GPU outages, and continuously seeking opportunities to improve GPU tracking processes and systems.About...
-
Technical Product Manager, AI Infrastructure
1 week ago
San Francisco, California, United States Altana AI Full timeCompany Overview">Altana AI is a pioneering company that applies artificial intelligence to the world's largest organized body of supply chain data. Our mission is to create a more resilient, secure, and sustainable model of global commerce by harnessing the power of AI. We collaborate with leading organizations and government agencies worldwide to build a...
-
Distributed AI Infrastructure Engineer
1 day ago
San Francisco, California, United States Together AI Full timeAbout Together AIWe are a research-driven artificial intelligence company. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models.Our team has made significant contributions to open-source research, models, and datasets that advance the frontier of AI. We invite you to join our...
-
AI Infrastructure Optimizer
2 weeks ago
San Francisco, California, United States Together AI Full timeJob Description:As a key member of Together AI's hardware team, you will be responsible for optimizing and scaling our decentralized GPU resources. This critical role involves ensuring the efficient operation of thousands of GPUs distributed across multiple data centers. Your expertise will enable cutting-edge AI advancements that democratize access to AI...
-
Software Engineer, AI Infrastructure
3 weeks ago
San Francisco, California, United States Waveforms AI, Inc Full timeJob title:Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical StaffWho We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive.Role...
-
AI Infrastructure Specialist
2 weeks ago
San Francisco, California, United States Together AI Full timeCompany OverviewTogether AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama.We invite you to join a passionate group of researchers in our...
-
Technical Program Manager, AI Infrastructure
2 weeks ago
San Francisco, California, United States Together AI Full timeKey Responsibilities:• Manage GPU hardware inventory across multiple decentralized data centers• Develop and maintain a system to log and track all GPU outages or malfunctions, including root cause analysis, downtime duration, and replacement cycles• Generate reports on utilization, availability, and performance trends, and recommend improvements•...
-
Senior AI Infrastructure Engineer
3 weeks ago
San Francisco, California, United States Together AI Full timeAs a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...
-
Senior AI Infrastructure Engineer
2 weeks ago
San Francisco, California, United States Together AI Full timeAbout the Role As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth. This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and...
-
AI Infrastructure Specialist
2 weeks ago
San Francisco, California, United States Distyl AI Full time**About Distyl AI**We develop AI native technologies for humans & AI to collaborate and power the operations of the Global Fortune 1000. Our platform, Distillery, along with our team of AI Engineers, Researchers, and Strategists, is pioneering AI-native systems of work.**Job Description**We're looking for an experienced AI Platform Engineer to design and...
-
Senior AI Infrastructure Engineer
3 hours ago
San Francisco, California, United States Together AI Full timeAs a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...
-
AI Infrastructure Software Engineer
2 weeks ago
San Francisco, California, United States Together AI Full timeAt Together AI, we are pushing the boundaries of artificial intelligence by developing state-of-the-art infrastructure for efficient and scalable inference. Our mission is to optimize inference frameworks, algorithms, and infrastructure, ensuring high-performance AI deployment across a diverse range of applications.About the RoleWe are seeking an Inference...
-
AI Infrastructure Specialist
1 week ago
San Francisco, California, United States The Rundown AI, Inc. Full timeAbout the RoleThe Rundown AI, Inc. is seeking a highly skilled Machine Learning Systems Engineer to join its Model Evaluations team. As a member of this team, you will be responsible for designing, building, and maintaining scalable systems that enable researchers to effectively evaluate models and conduct inference tasks critical to the organization's...
-
AI Infrastructure Specialist
1 week ago
San Francisco, California, United States The Rundown AI, Inc. Full timeAbout the RoleThe Rundown AI, Inc. is seeking an AI Infrastructure Specialist to join our Data Encodings and Tokenization team. As a key member of our team, you'll play a crucial role in developing and optimizing the encodings and tokenization systems used throughout our Finetuning workflows.This position requires a strong understanding of machine learning...
-
Enterprise AI Solution Manager
6 days ago
San Francisco, California, United States Snorkel AI Full timeAt Snorkel AI, we're on a mission to make machine learning practical for everyone. To achieve this, we're building the definitive AI data development platform.The AI landscape has gone through incredible change over the years, but one thing has remained constant: high-quality data is key to achieving differentiation, high performance, and production-ready...
-
AI Product Manager
7 days ago
San Francisco, California, United States Altana AI Full timeTransforming Global Commerce with Altana AIAltana AI is a pioneering organization that harnesses the power of artificial intelligence to revolutionize global commerce. Our cutting-edge platform empowers customers to build resilience, automate trade, and transform industries.As a Principal Technical Product Manager, Graph Product Management, you will play a...
-
AI Systems Infrastructure Specialist
1 week ago
San Francisco, California, United States The Rundown AI, Inc. Full timeAbout The Rundown AI, Inc.Company OverviewThe Horizons team at The Rundown AI, Inc. leads the development of our company's reinforcement learning research and advancements in AI systems. We've made significant contributions to all Claude models, with substantial impacts on the autonomy and coding capabilities of Claude 3.5 and 3.7 Sonnet.About the RoleAs an...
-
AI Infrastructure Manager
1 week ago
San Francisco, California, United States Labelbox Full timeThe ChallengeLabelbox is revolutionizing the way we develop AI models by providing a comprehensive platform that powers breakthroughs in AI research and enterprise applications. As an AI Infrastructure Manager, you will play a critical role in shaping the future of AI infrastructure and leading our engineering team to deliver high-quality, scalable, and...