Software Engineer, AI Infrastructure

4 weeks ago


San Francisco, California, United States Waveforms AI, Inc Full time
Job title:

Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff

Who We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive.

Role overview:

The

Software Engineer, AI Infrastructure (Training + Inference)

will be responsible for designing, building, and optimizing the infrastructure that powers our large scale training and real-time inference pipelines. This role combines expertise in distributed computing, system reliability, and performance optimization. The candidate will collaborate with researchers with a focus on building scalable systems to support novel multimodal training and maintaining uptime to deliver consistent results for real-time applications.

Key Responsibilities
Infrastructure Development:

Design and implement infrastructure to support large-scale AI training and real-time inference with a focus on multimodal inputs.
Distributed Computing:

Build and maintain distributed systems to ensure scalability, efficient resource allocation, and high throughput.
Training Stability:

Monitor and enhance the stability of training workflows by addressing bottlenecks, failures, and inefficiencies in large-scale AI pipelines.
Real-time Inference Optimization:

Develop and optimize real-time inference systems to deliver low-latency, high-throughput results across diverse applications.
Uptime & Reliability:

Implement tools and processes to maintain high uptime and ensure infrastructure reliability during both training and inference phases.
Performance Tuning:

Identify and resolve performance bottlenecks, improving overall system throughput and response times.
Collaboration:

Work closely with research and engineering teams to integrate infrastructure with AI workflows, ensuring seamless deployment and operation.
Required Skills & Qualifications

Distributed Systems Expertise:

Proven experience in designing and managing distributed systems for large-scale AI training and inference.
Infrastructure for AI:

Strong background in building and optimizing infrastructure for real-time AI systems, with a focus on multimodal data (audio + text).
Performance Optimization:

Expertise in optimizing resource utilization, improving system throughput, and reducing latency in both training and inference.
Training Stability:

Experience in troubleshooting and stabilizing AI training pipelines for high reliability and efficiency.
Technical Proficiency:

Strong programming skills (Python preferred), proficiency with PyTorch, and familiarity with cloud platforms (AWS, GCP, Azure).
Minimum Experience

4-5 years of relevant professional experience is required


  • San Francisco, California, United States WaveForms AI Full time

    Job title: Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff Who We Are WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive. Role...


  • San Francisco, California, United States Runloop AI, Inc Full time

    About Runloop Runloop is pioneering the next generation of AI-driven software engineering. Our platform empowers developers to build, scale, and optimize AI-powered coding solutions, accelerating the future of software development. We're a small team of former Google and Stripe engineers dedicated to solving the complex challenges of productionizing AI for...


  • San Francisco, California, United States Runloop AI, Inc Full time

    About RunloopRunloop is pioneering the next generation of AI-driven software engineering. Our platform empowers developers to build, scale, and optimize AI-powered coding solutions, accelerating the future of software development. We're a small team of former Google and Stripe engineers dedicated to solving the complex challenges of productionizing AI for...


  • San Francisco, California, United States Runloop AI, Inc Full time

    About Runloop Runloop is pioneering the next generation of AI-driven software engineering. Our platform empowers developers to build, scale, and optimize AI-powered coding solutions, accelerating the future of software development. We're a small team of former Google and Stripe engineers dedicated to solving the complex challenges of productionizing AI for...


  • San Francisco, California, United States Runloop AI, Inc Full time

    About Runloop Runloop is pioneering the next generation of AI-driven software engineering. Our platform empowers developers to build, scale, and optimize AI-powered coding solutions, accelerating the future of software development. We're a small team of former Google and Stripe engineers dedicated to solving the complex challenges of productionizing AI for...


  • San Francisco, California, United States Together AI Full time

    About Together AIWe are a research-driven artificial intelligence company. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models.Our team has made significant contributions to open-source research, models, and datasets that advance the frontier of AI. We invite you to join our...


  • San Francisco, California, United States Snorkel AI Full time

    Staff Software Engineer — Infrastructure Hybrid / San Francisco, CA or Redwood City, CAWe're on a mission to democratize AI by building the definitive AI data development platform. The AI landscape has gone through incredible change between 2016, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI breakthroughs of...


  • San Francisco, California, United States Skild AI Full time

    Company OverviewAt Skild AI, we are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing. We believe massive scale through data-driven machine learning is the key to unlocking these capabilities for the widespread deployment of robots within society. Our team consists of individuals...


  • San Francisco, California, United States Together AI Full time

    As a Senior Infrastructure Software Engineer, you will focus on automating infrastructure installations and decommissions at scale. You will build tools to constantly improve our scale and speed of deployment. You will nurture a passion for an "automate everything" approach that makes systems failure-resistant and ready-to-scale.Your work will enable our...


  • San Francisco, California, United States Together AI Full time

    As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth.This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive...