AI Infrastructure Engineer
2 weeks ago
About the Role We’re hiring an AI Infrastructure Engineer to shape and scale the backend systems that power our AI platform. As a Series A company, your work will be foundational, enabling safe, efficient, and reliable AI workflows from end to end. What You’ll Do Design and implement scalable backend architectures for AI workloads (inference, orchestration, monitoring). Own distributed job orchestration with Temporal and related systems. Improve data pipeline performance by designing smarter caching strategies (e.g., file deduplication, hot/cold storage, Redis caching layers) to reduce redundant compute and API calls. Build observability, monitoring, retries, and fault tolerance into all workflows. Manage infrastructure reliability, incident response, and performance. Develop tooling and platform infrastructure to support rapid growth. Partner with ML engineers to bring models to production at scale. What We’re Looking For 4+ years of backend engineering (Python is a must). Strong background in distributed systems, job orchestration, and task queues. Deep knowledge of concurrency, parallelism, and multithreading—including async/await, event loops, thread pools, synchronization primitives, deadlocks, and race conditions—is a must. You should know how to design systems that maximize throughput without sacrificing correctness or safety. Hands‑on experience with Temporal, Redis, Airflow, Celery, RabbitMQ (or similar). Experience with LLM serving and routing fundamentals (rate limiting, streaming, load balancing, budgets). Comfortable with containers & orchestration: Docker, Kubernetes. Familiarity with cloud platforms (AWS/GCP) and IaC (Terraform). Experience with multiple storage systems: S3, Postgres, MongoDB, Redis, and Elasticsearch. Track record scaling systems in startups or fast‑paced environments. Understanding of deploying, monitoring, and optimizing AI/ML systems in production with strong CI/CD practices. Why You’ll Love Working Here Play a foundational role at a fast‑growing Series A startup that is shaping the future of AI in enterprise workflows. Collaborate across Product, ML, and Platform teams, being the bridge between AI logic and scalable execution. Build infrastructure that enables real value for large enterprises: low‑code, secure, and scalable AI workflows. Join a company that’s scaling thoughtfully and values developer experience. #J-18808-Ljbffr
-
Solutions Engineer
3 weeks ago
San Francisco, United States Novita AI Full timeAbout Us We are a high-growth, global AI cloud infrastructure provider at the forefront of the artificial intelligence revolution. Our cutting-edge platform offers developers and enterprises powerful, scalable, and easy-to-use solutions, including Model APIs, GPU Instances, and Serverless Computing. As businesses worldwide race to integrate AI into their...
-
San Francisco, United States Skild AI Full timeCompany OverviewAt Skild AI, we are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing. We believe massive scale through data-driven machine learning is the key to unlocking these capabilities for the widespread deployment of robots within society. Our team consists of individuals...
-
Software Engineer, Infrastructure
3 weeks ago
San Francisco, United States Runloop AI Full timeJoin to apply for the Software Engineer, Infrastructure role at Runloop AI. About Runloop Runloop is pioneering the next generation of AI-driven software engineering. Our platform empowers developers to build, scale, and optimize AI-powered coding solutions, accelerating the future of software development. We’re a small team of former Google and Stripe...
-
AI Infrastructure Engineer, Core Infrastructure
3 weeks ago
San Francisco, CA, United States Scale AI Full timeAs a Software Engineer on the ML Infrastructure team, you will design and build the next generation of foundational systems that power all ML Infrastructure compute at Scale - from model training and evaluation to large-scale inference and experimentation. Our platform is responsible for orchestrating workloads across heterogeneous compute environments (GPU,...
-
Software Engineer, AI Training Infrastructure
3 weeks ago
San Mateo, United States Fireworks AI Full timeAbout Us: At Fireworks, we're building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and...
-
Software Engineer, AI Training Infrastructure
2 weeks ago
San Mateo, CA, United States Fireworks AI Full timeAbout Us: At Fireworks, we're building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and...
-
Software Engineer, Infrastructure
3 weeks ago
San Francisco, United States Runloop AI, Inc. Full timeRunloop is pioneering the next generation of AI-driven software engineering. Our platform empowers developers to build, scale, and optimize AI-powered coding solutions, accelerating the future of software development. We're a small team of former Google and Stripe engineers dedicated to solving the complex challenges of productionizing AI for software...
-
Infrastructure Engineer, Data Platform
3 weeks ago
San Francisco, United States Together AI Full timeJoin to apply for the Lead Cloud Infrastructure Engineer role at Together AI Join to apply for the Lead Cloud Infrastructure Engineer role at Together AI Get AI-powered advice on this job and more exclusive features. About The Role Together AI is hiring a Lead Cloud Infrastructure Engineer to own and operate the cloud foundation that powers our rapidly...
-
Infrastructure Engineer, Data Platform
3 weeks ago
San Francisco, United States Together AI Full timeJoin to apply for the Lead Cloud Infrastructure Engineer role at Together AI Join to apply for the Lead Cloud Infrastructure Engineer role at Together AI Get AI-powered advice on this job and more exclusive features. About The Role Together AI is hiring a Lead Cloud Infrastructure Engineer to own and operate the cloud foundation that powers our rapidly...
-
Infrastructure Engineer, Data Platform
3 weeks ago
San Francisco, United States Together AI Full timeJoin to apply for the Lead Cloud Infrastructure Engineer role at Together AIJoin to apply for the Lead Cloud Infrastructure Engineer role at Together AIGet AI-powered advice on this job and more exclusive features.About The RoleTogether AI is hiring a Lead Cloud Infrastructure Engineer to own and operate the cloud foundation that powers our rapidly scaling...