AI Infrastructure Optimizer
2 weeks ago
As a key member of Together AI's hardware team, you will be responsible for optimizing and scaling our decentralized GPU resources. This critical role involves ensuring the efficient operation of thousands of GPUs distributed across multiple data centers. Your expertise will enable cutting-edge AI advancements that democratize access to AI technology globally.
The responsibilities of this position include:
- Monitoring and managing GPU hardware inventory across multiple decentralized data centers
- Developing and maintaining a system to log and track all GPU outages or malfunctions
- Generating reports on utilization, availability, and performance trends
- Continuously seeking opportunities to improve GPU tracking processes and systems
About Together AI:
Together AI is a research-driven artificial intelligence company dedicated to significantly lowering the cost of modern AI systems by co-designing software, hardware, algorithms, and models. Our team has contributed to leading open-source research, models, and datasets to advance the frontier of AI. We are committed to building the next generation of AI infrastructure with passion and innovation.
Requirements:
This position requires:
- A bachelor's degree in business, information technology, or engineering-related fields
- At least 3 years of experience in technical program management, inventory management, and/or data center operations/project management
- Proficiency with inventory management and/or project management systems and tools
- Experience with data analytics and report generation for performance monitoring
- Strong communication skills for handling customer inquiries
- Excellent problem-solving skills and ability to work in a fast-paced environment
-
Software Engineer, AI Infrastructure
6 days ago
San Francisco, California, United States WaveForms AI Full timeJob title: Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff Who We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive. Role...
-
Software Engineer, AI Infrastructure
3 weeks ago
San Francisco, California, United States Waveforms AI, Inc Full timeJob title:Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical StaffWho We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive.Role...
-
AI Infrastructure Software Engineer
2 weeks ago
San Francisco, California, United States Together AI Full timeAt Together AI, we are pushing the boundaries of artificial intelligence by developing state-of-the-art infrastructure for efficient and scalable inference. Our mission is to optimize inference frameworks, algorithms, and infrastructure, ensuring high-performance AI deployment across a diverse range of applications.About the RoleWe are seeking an Inference...
-
Distributed AI Infrastructure Engineer
1 day ago
San Francisco, California, United States Together AI Full timeAbout Together AIWe are a research-driven artificial intelligence company. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models.Our team has made significant contributions to open-source research, models, and datasets that advance the frontier of AI. We invite you to join our...
-
AI Infrastructure Specialist
1 week ago
San Francisco, California, United States The Rundown AI, Inc. Full timeAbout the RoleThe Rundown AI, Inc. is seeking a highly skilled Machine Learning Systems Engineer to join its Model Evaluations team. As a member of this team, you will be responsible for designing, building, and maintaining scalable systems that enable researchers to effectively evaluate models and conduct inference tasks critical to the organization's...
-
AI Infrastructure Specialist
1 week ago
San Francisco, California, United States The Rundown AI, Inc. Full timeAbout the RoleThe Rundown AI, Inc. is seeking an AI Infrastructure Specialist to join our Data Encodings and Tokenization team. As a key member of our team, you'll play a crucial role in developing and optimizing the encodings and tokenization systems used throughout our Finetuning workflows.This position requires a strong understanding of machine learning...
-
AI Systems Infrastructure Specialist
1 week ago
San Francisco, California, United States The Rundown AI, Inc. Full timeAbout The Rundown AI, Inc.Company OverviewThe Horizons team at The Rundown AI, Inc. leads the development of our company's reinforcement learning research and advancements in AI systems. We've made significant contributions to all Claude models, with substantial impacts on the autonomy and coding capabilities of Claude 3.5 and 3.7 Sonnet.About the RoleAs an...
-
AI Infrastructure Manager
2 weeks ago
San Francisco, California, United States Together AI Full timeCompany OverviewTowards a More Transparent AI FutureTogether AI is revolutionizing the field of artificial intelligence by co-designing software, hardware, algorithms, and models. Our mission is to significantly lower the cost of modern AI systems, making them more accessible to everyone. With contributions to leading open-source research, models, and...
-
AI Infrastructure Optimization Specialist
1 week ago
San Francisco, California, United States Jobleads-US Full timeAt Jobleads-US, we're seeking a highly skilled Training Dataset and Checkpoint Acceleration Engineer to join our team of experts in developing AI infrastructure.We focus on creating scalable, efficient systems for handling massive datasets and managing large-scale distributed checkpoints. As a key member of our team, you'll work at the intersection of data...
-
AI Infrastructure Specialist
2 weeks ago
San Francisco, California, United States Together AI Full timeCompany OverviewTogether AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama.We invite you to join a passionate group of researchers in our...
-
AI Performance Optimizer
6 days ago
San Francisco, California, United States Deccan AI Full timeWe're seeking a highly skilled AI Performance Optimizer to join our team at Deccan AI. As one of the first salespeople on board, you'll play a crucial role in helping companies improve their AI model performance using high-quality data.Our startup is young and rapidly growing, with partnerships established with some of the biggest tech firms in the industry....
-
Program Manager
1 week ago
San Francisco, California, United States The Rundown AI, Inc. Full timeAbout the RoleThe Rundown AI, Inc. is seeking a highly organized Program Manager - Infrastructure Optimization to join its Capacity Engineering & Efficiency team. This critical role involves leading efforts to develop self-service tools and dashboards to enable anthropic engineers to understand their capacity, efficiency, and costs.You will investigate...
-
AI Infrastructure Engineer
2 days ago
San Mateo, California, United States Lumino Ai Full timeAbout LuminoAt Lumino Ai, our mission is to harness the potential of AI for humanity. We're building infrastructure that empowers anyone to create AI models.About the Role:We're seeking an experienced Machine Learning Engineer to join our team and contribute to setting up the foundations of our company. As a key member, you'll be responsible for designing...
-
AI Infrastructure Specialist
2 weeks ago
San Francisco, California, United States Distyl AI Full time**About Distyl AI**We develop AI native technologies for humans & AI to collaborate and power the operations of the Global Fortune 1000. Our platform, Distillery, along with our team of AI Engineers, Researchers, and Strategists, is pioneering AI-native systems of work.**Job Description**We're looking for an experienced AI Platform Engineer to design and...
-
Senior AI Infrastructure Engineer
3 weeks ago
San Francisco, California, United States Together AI Full timeAs a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...
-
Senior AI Infrastructure Engineer
2 weeks ago
San Francisco, California, United States Together AI Full timeAbout the Role As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth. This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and...
-
AI Research Infrastructure Specialist
5 days ago
San Francisco, California, United States The Rundown AI, Inc. Full timeKey ResponsibilitiesWe're seeking a talented engineer to join our team and take on the following projects:Design and implement high-performance data pipelines for processing large-scale code datasets with an emphasis on reliability and reproducibilityBuild and maintain secure sandboxed execution environments using virtualization technologies like GVisor and...
-
AI Researcher
7 days ago
San Francisco, California, United States WaveForms AI Full timeCompany Overview">WaveForms AI is a pioneering Audio Large Language Models (LLMs) company at the forefront of audio intelligence innovation. Our mission is to push the boundaries of multimodal AI systems, combining cutting-edge research and products to revolutionize the field.">Job Description">The Research Engineer – Pre-training & Post-training role is a...
-
Senior AI Infrastructure Engineer
3 hours ago
San Francisco, California, United States Together AI Full timeAs a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...
-
ai infrastructure engineer
1 week ago
San Francisco, California, United States Source Technology Full timeWe are seeking a highly skilled AI Infrastructure Engineer to join our team on a contract basis. The ideal candidate will have experience in designing, deploying, and managing scalable infrastructure for AI and machine learning (ML) applications. This role will focus on optimizing workflows, ensuring system reliability, and enabling seamless integration of...