AI Infrastructure Specialist
5 days ago
Job Description:
We are seeking a highly skilled AI Infrastructure Specialist to join our team. As an expert in large-scale machine learning infrastructure, you will be responsible for designing and developing scalable and efficient AI systems.
Responsibilities:
- Architect and develop AI infrastructure to support large-scale diffusion models and multi-modal generative AI workloads.
- Optimize model training and inference using PyTorch, Triton, TensorRT, and distributed training libraries.
- Implement and optimize model using sequence parallelism, pipeline parallelism, and tensor parallelism to improve performance on high-throughput training clusters.
- Scale and productionize generative AI models, ensuring efficient deployment on heterogeneous hardware environments.
- Develop and integrate model distillation techniques to enhance the efficiency and performance of generative models.
- Design and maintain an automated model production pipeline for training/inference at scale, integrating distributed data processing frameworks.
- Enhance platform stability and efficiency by refining model orchestration, checkpointing, and retrieval strategies.
About the Role:
This is an exciting opportunity to join a team that is pushing the boundaries of AI research and development. As an AI Infrastructure Specialist, you will work closely with cross-functional teams to ensure seamless model iteration cycles and deployments.
Qualifications:
- B.S., M.S., or Ph.D. in Computer Science, Electrical Engineering, or a related field.
- 3+ years of hands-on experience in large-scale machine learning infrastructure and distributed AI model training.
- Deep expertise in PyTorch, CUDA optimization, and ML frameworks.
- Strong understanding of high-performance computing, low-latency inference, and GPU acceleration techniques.
- Hands-on experience in scaling AI infrastructure, leveraging Kubernetes, Docker, Ray, and Triton inference servers.
-
AI Infrastructure Specialist
2 weeks ago
San Francisco, California, United States The Rundown AI, Inc. Full timeAbout the RoleThe Rundown AI, Inc. is seeking an AI Infrastructure Specialist to join our Data Encodings and Tokenization team. As a key member of our team, you'll play a crucial role in developing and optimizing the encodings and tokenization systems used throughout our Finetuning workflows.This position requires a strong understanding of machine learning...
-
AI Infrastructure Specialist
3 days ago
San Jose, California, United States beBee Careers Full timeWe are building AI-powered agents that empower users to do more with less effort. Our mission is to make automation more intuitive, flexible, and useful, redefining how people delegate work – not just tasks.We are seeking an experienced Ai Infrastructure Specialist to drive the next evolution of our AI Agent platform. This individual will be responsible...
-
Senior AI Infrastructure Specialist
3 days ago
San Jose, California, United States beBee Careers Full timeJob Summary:We are seeking a highly skilled Senior AI Infrastructure Specialist to join our team. In this role, you will be responsible for designing and developing scalable and efficient AI infrastructure to support large-scale diffusion models and multi-modal generative AI workloads.Key Responsibilities:Achieve high-performance inferences through advanced...
-
Software Engineer, AI Infrastructure
2 weeks ago
San Francisco, California, United States WaveForms AI Full timeJob title: Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff Who We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive. Role...
-
AI Infrastructure Specialist
2 weeks ago
San Francisco, California, United States The Rundown AI, Inc. Full timeAbout the RoleThe Rundown AI, Inc. is seeking a highly skilled Machine Learning Systems Engineer to join its Model Evaluations team. As a member of this team, you will be responsible for designing, building, and maintaining scalable systems that enable researchers to effectively evaluate models and conduct inference tasks critical to the organization's...
-
Distributed AI Infrastructure Engineer
6 days ago
San Francisco, California, United States Together AI Full timeAbout Together AIWe are a research-driven artificial intelligence company. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models.Our team has made significant contributions to open-source research, models, and datasets that advance the frontier of AI. We invite you to join our...
-
AI Infrastructure Specialist
1 week ago
San Francisco, California, United States Replicate, Inc. Full timeWe are seeking an experienced AI Infrastructure Specialist to join our team. As a specialist in this area, you will be responsible for designing and implementing our AI infrastructure products. This includes working closely with our AI engineers to understand their needs and preferences, and using this information to inform product decisions.You will be...
-
AI Infrastructure Engineer
7 days ago
San Mateo, California, United States Lumino Ai Full timeAbout LuminoAt Lumino Ai, our mission is to harness the potential of AI for humanity. We're building infrastructure that empowers anyone to create AI models.About the Role:We're seeking an experienced Machine Learning Engineer to join our team and contribute to setting up the foundations of our company. As a key member, you'll be responsible for designing...
-
Software Engineer, AI Infrastructure
4 weeks ago
San Francisco, California, United States Waveforms AI, Inc Full timeJob title:Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical StaffWho We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive.Role...
-
AI Systems Infrastructure Specialist
2 weeks ago
San Francisco, California, United States The Rundown AI, Inc. Full timeAbout The Rundown AI, Inc.Company OverviewThe Horizons team at The Rundown AI, Inc. leads the development of our company's reinforcement learning research and advancements in AI systems. We've made significant contributions to all Claude models, with substantial impacts on the autonomy and coding capabilities of Claude 3.5 and 3.7 Sonnet.About the RoleAs an...