Member of Technical Staff, AI Training Infrastructure

2 weeks ago

San Mateo, California, United States Fireworks AI Full time

About Us:
At Fireworks, we're building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We're an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:
As a Training Infrastructure Engineer, you'll design, build, and optimize the infrastructure that powers our large-scale model training operations. Your work will be essential to developing high-performance AI training infrastructure. You'll collaborate with AI researchers and engineers to create robust training pipelines, optimize distributed training workloads, and ensure reliable model development.

Key Responsibilities:

Design and implement scalable infrastructure for large-scale model training workloads
Develop and maintain distributed training pipelines for LLMs and multimodal models
Optimize training performance across multiple GPUs, nodes, and data centers
Implement monitoring, logging, and debugging tools for training operations
Architect and maintain data storage solutions for large-scale training datasets
Automate infrastructure provisioning, scaling, and orchestration for model training
Collaborate with researchers to implement and optimize training methodologies
Analyze and improve efficiency, scalability, and cost-effectiveness of training systems
Troubleshoot complex performance issues in distributed training environments

Minimum Qualifications:

Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
3+ years of experience with distributed systems and ML infrastructure
Experience with PyTorch
Proficiency in cloud platforms (AWS, GCP, Azure)
Experience with containerization, orchestration (Kubernetes, Docker)
Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)

Preferred Qualifications:

Master's or PhD in Computer Science or related field
Experience training large language models or multimodal AI systems
Experience with ML workflow orchestration tools
Background in optimizing high-performance distributed computing systems
Familiarity with ML DevOps practices
Contributions to open-source ML infrastructure or related projects

Total compensation for this role also includes meaningful equity in a fast-growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted.

Base Pay Range (Plus Equity)

$175,000—$220,000 USD

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What's Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Member of Technical Staff: ML Infrastructure

2 weeks ago

San Francisco, California, United States Essential AI Full time

About UsEssential AI is building an open platform to fuel and accelerate AI breakthroughs globally. Our open models, robust tooling, reproducible pipelines, and evaluation frameworks are designed for collaboration and contribution, empowering others to build, iterate, and innovate faster.Essential AI's technology and products have the means to shape AI...
Member of Technical Staff, Synthetic Data

29 minutes ago

San Francisco, California, United States Fleet AI Full time

IntroductionWe work with frontier labs, hyperscalers, and enterprises to help develop and deploy the next generation of embodied agents. We see the creation of evals and environments, codifying human goals for agents, as the highest leverage human activity in the build up to ASI.We've raised from Sequoia Capital, Menlo Ventures, Bain Capital Ventures, and SV...
Member of Technical Staff: ML Infrastructure, Platform Engineer

3 days ago

San Francisco, California, United States Essential AI Full time

About UsEssential AI is building an open platform to fuel and accelerate AI breakthroughs globally. Our open models, robust tooling, reproducible pipelines, and evaluation frameworks are designed for collaboration and contribution, empowering others to build, iterate, and innovate faster.Essential AI's technology and products have the means to shape AI...
Member of Technical Staff

2 weeks ago

San Francisco, California, United States Reflection AI Full time

Our MissionReflection's mission is tobuild open superintelligence and make it accessible to all.We're developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond.About The RoleResearch and build...
Staff Product Manager, Infrastructure as a Service

2 weeks ago

San Francisco, California, United States Together AI Full time

Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fast, reliable inference and model shaping services with state-of-the-art AI cloud infrastructure.As a Staff Product Manager, you will play a key role in building the next generation AI cloud platform – a highly available, global,...
Senior Member of Technical Staff, Multimodal AI

2 weeks ago

San Francisco, California, United States Cohere Full time

Who are we?Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we...
Member of Technical Staff – AGI Governance Researcher

4 days ago

San Francisco, California, United States Elloe AI Full time

Full-time | Remote | R\&D | Reports to Chief AI Safety Architect About Elloe Elloe is the trust layer for AI. We sit between the world's most powerful language models and the institutions that can't afford to get it wrong — hospitals, banks, regulators. We trace and block failures in real time. That's not marketing — we're deployed at the European...
Member of Technical Staff

1 week ago

San Francisco, California, United States Liquid AI Full time

Work With UsAt Liquid, we're not just building AI models—we're redefining the architecture of intelligence itself. Spun out of MIT, our mission is to build efficient AI systems at every scale. Our Liquid Foundation Models (LFMs) operate where others can't: on-device, at the edge, under real-time constraints. We're not iterating on old ideas—we're...
Member of Technical Staff

6 days ago

San Francisco, California, United States Amazon Full time

The Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents that can take actions in the digital and physical worlds. We're enabling practical AI that can actually do things for us and make our customers more productive, empowered, and fulfilled.The lab is designed to empower AI researchers and engineers to make...
Member of Technical Staff

2 weeks ago

San Francisco, California, United States Amazon Full time

DESCRIPTIONThe Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents that can take actions in the digital and physical worlds. We're enabling practical AI that can actually do things for us and make our customers more productive, empowered, and fulfilled. The lab is designed to empower AI researchers and...

Americas

Europe

Asia / Oceania

Africa

Member of Technical Staff, AI Training Infrastructure