Member of Technical Staff, AI Training Infrastructure
2 weeks ago
About Us:
At Fireworks, we're building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We're an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.
The Role:
As a Training Infrastructure Engineer, you'll design, build, and optimize the infrastructure that powers our large-scale model training operations. Your work will be essential to developing high-performance AI training infrastructure. You'll collaborate with AI researchers and engineers to create robust training pipelines, optimize distributed training workloads, and ensure reliable model development.
Key Responsibilities:
- Design and implement scalable infrastructure for large-scale model training workloads
- Develop and maintain distributed training pipelines for LLMs and multimodal models
- Optimize training performance across multiple GPUs, nodes, and data centers
- Implement monitoring, logging, and debugging tools for training operations
- Architect and maintain data storage solutions for large-scale training datasets
- Automate infrastructure provisioning, scaling, and orchestration for model training
- Collaborate with researchers to implement and optimize training methodologies
- Analyze and improve efficiency, scalability, and cost-effectiveness of training systems
- Troubleshoot complex performance issues in distributed training environments
Minimum Qualifications:
- Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
- 3+ years of experience with distributed systems and ML infrastructure
- Experience with PyTorch
- Proficiency in cloud platforms (AWS, GCP, Azure)
- Experience with containerization, orchestration (Kubernetes, Docker)
- Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)
Preferred Qualifications:
- Master's or PhD in Computer Science or related field
- Experience training large language models or multimodal AI systems
- Experience with ML workflow orchestration tools
- Background in optimizing high-performance distributed computing systems
- Familiarity with ML DevOps practices
- Contributions to open-source ML infrastructure or related projects
Total compensation for this role also includes meaningful equity in a fast-growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted.
Base Pay Range (Plus Equity)
$175,000—$220,000 USD
Why Fireworks AI?
- Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
- Build What's Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
- Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
- Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.
Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.
-
Member of Technical Staff: ML Infrastructure
2 weeks ago
San Francisco, California, United States Essential AI Full timeAbout UsEssential AI is building an open platform to fuel and accelerate AI breakthroughs globally. Our open models, robust tooling, reproducible pipelines, and evaluation frameworks are designed for collaboration and contribution, empowering others to build, iterate, and innovate faster.Essential AI's technology and products have the means to shape AI...
-
Member of Technical Staff, Synthetic Data
29 minutes ago
San Francisco, California, United States Fleet AI Full timeIntroductionWe work with frontier labs, hyperscalers, and enterprises to help develop and deploy the next generation of embodied agents. We see the creation of evals and environments, codifying human goals for agents, as the highest leverage human activity in the build up to ASI.We've raised from Sequoia Capital, Menlo Ventures, Bain Capital Ventures, and SV...
-
San Francisco, California, United States Essential AI Full timeAbout UsEssential AI is building an open platform to fuel and accelerate AI breakthroughs globally. Our open models, robust tooling, reproducible pipelines, and evaluation frameworks are designed for collaboration and contribution, empowering others to build, iterate, and innovate faster.Essential AI's technology and products have the means to shape AI...
-
Member of Technical Staff
2 weeks ago
San Francisco, California, United States Reflection AI Full timeOur MissionReflection's mission is tobuild open superintelligence and make it accessible to all.We're developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond.About The RoleResearch and build...
-
San Francisco, California, United States Together AI Full timeTogether AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fast, reliable inference and model shaping services with state-of-the-art AI cloud infrastructure.As a Staff Product Manager, you will play a key role in building the next generation AI cloud platform – a highly available, global,...
-
Senior Member of Technical Staff, Multimodal AI
2 weeks ago
San Francisco, California, United States Cohere Full timeWho are we?Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we...
-
San Francisco, California, United States Elloe AI Full timeFull-time | Remote | R\&D | Reports to Chief AI Safety Architect About Elloe Elloe is the trust layer for AI. We sit between the world's most powerful language models and the institutions that can't afford to get it wrong — hospitals, banks, regulators. We trace and block failures in real time. That's not marketing — we're deployed at the European...
-
Member of Technical Staff
1 week ago
San Francisco, California, United States Liquid AI Full timeWork With UsAt Liquid, we're not just building AI models—we're redefining the architecture of intelligence itself. Spun out of MIT, our mission is to build efficient AI systems at every scale. Our Liquid Foundation Models (LFMs) operate where others can't: on-device, at the edge, under real-time constraints. We're not iterating on old ideas—we're...
-
Member of Technical Staff
6 days ago
San Francisco, California, United States Amazon Full timeThe Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents that can take actions in the digital and physical worlds. We're enabling practical AI that can actually do things for us and make our customers more productive, empowered, and fulfilled.The lab is designed to empower AI researchers and engineers to make...
-
Member of Technical Staff
2 weeks ago
San Francisco, California, United States Amazon Full timeDESCRIPTIONThe Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents that can take actions in the digital and physical worlds. We're enabling practical AI that can actually do things for us and make our customers more productive, empowered, and fulfilled. The lab is designed to empower AI researchers and...