AI Infrastructure Engineer
2 weeks ago
A stealth-stage AI infrastructure company is building a self-healing system for software that automates defect resolution and development. The platform is used by engineering and support teams to:
Autonomously debug problems in production software
Fix issues directly in the codebase
Prevent recurring issues through intelligent root-cause automation
The company is backed by top-tier investors such as Foundation Capital, WndrCo, and Green Bay Ventures, as well as prominent operators including Matei Zaharia, Drew Houston, Dylan Field, Guillermo Rauch, and others.
We believe that as software development accelerates, the burden of maintaining quality and reliability shifts heavily onto engineering and support teams. This challenge creates a rare opportunity to reimagine how software is supported and sustained—with AI-powered systems that respond autonomously.
About the RoleWe're looking for an experienced backend/infrastructure engineer who thrives at the intersection of systems and AI — and who loves turning research prototypes into rock-solid production services. You'll design and scale the core backend that powers our AI inference stack — from ingestion pipelines and feature stores to GPU orchestration and vector search.
If you care deeply about performance, correctness, observability, and fast iteration, you'll fit right in.
What You'll DoOwn mission-critical services end-to-end — from architecture and design reviews to deployment, observability, and service-level objectives.
Scale LLM-driven systems: build RAG pipelines, vector indexes, and evaluation frameworks handling billions of events per day.
Design data-heavy backends: streaming ETL, columnar storage, time-series analytics — all fueling the self-healing loop.
Optimize for cost and latency across compute types (CPUs, GPUs, serverless); profile hot paths and squeeze out milliseconds.
Drive reliability: implement automated testing, chaos engineering, and progressive rollout strategies for new models.
Work cross-functionally with ML researchers, product engineers, and real customers to build infrastructure that actually matters.
Have 2–5+ years of experience building scalable backend or infra systems in production environments
Bring a builder mindset — you like owning projects end-to-end and thinking deeply about data, scale, and maintainability
Have transitioned ML or data-heavy prototypes to production, balancing speed and robustness
Are comfortable with data engineering workflows: parsing, transforming, indexing, and querying structured or unstructured data
Have some exposure to search infrastructure or LLM-backed systems (e.g., document retrieval, RAG, semantic search)
Experience with vector databases (e.g., pgvector, Pinecone, Weaviate) or inverted-index search (e.g., Elasticsearch, Lucene)
Hands-on with GPU orchestration (Kubernetes, Ray, KServe) or model-parallel inference tuning
Familiarity with Go / Rust (primary stack), with some TypeScript for light full-stack tasks
Deep knowledge of observability tooling (OpenTelemetry, Grafana, Datadog) and profiling distributed systems
Contributions to open-source ML or systems infrastructure projects
Let me know if you'd like a version optimized for careers pages, job boards, or stealth pitch decks.
-
Infrastructure Engineer, Data Platform
2 weeks ago
San Francisco, California, United States Together AI Full timeAbout the RoleTogether AI is hiring a Infrastructure engineer to own and operate the data platform that powers our rapidly scaling data platforms. In this role, you will be the primary engineer responsible for defining, building, and maintaining the AWS infrastructure that underpins data engineering systems across the company — from internal analytics...
-
San Mateo, California, United States Fireworks AI Full timeAbout Us:At Fireworks, we're building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and...
-
Senior AI Cloud Infrastructure Engineer
4 days ago
San Francisco, California, United States Hamilton Barnes 🌳 Full timeSenior AI Cloud Infrastructure Engineer (GPU Compute)Join a top-tier, fast-growing technology company to architect and manage the critical GPU infrastructure that powers all of their Machine Learning and AI initiatives. This is a high-impact, hands-on role where you will design and scale the entire cloud ecosystem—from bare-metal hardware provisioning and...
-
San Francisco, California, United States Baseten Full timeAbout BasetenBaseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. With...
-
AI Infrastructure Engineer
6 days ago
San Francisco, California, United States PlayerZero Full timeAbout PlayerZeroPlayerZero is building a self‑healing system for software—automating defect detection, diagnosis, and remediation so developers ship with confidence. Teams use PlayerZero to spot issues before customers do, pinpoint root causes fast, and close the loop from incident to fix.Our platform includes capabilities like Agentic Debugging and Code...
-
San Francisco, California, United States Essential AI Full timeAbout UsEssential AI is building an open platform to fuel and accelerate AI breakthroughs globally. Our open models, robust tooling, reproducible pipelines, and evaluation frameworks are designed for collaboration and contribution, empowering others to build, iterate, and innovate faster.Essential AI's technology and products have the means to shape AI...
-
San Francisco, California, United States Together AI Full timeTogether AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fast, reliable inference and model shaping services with state-of-the-art AI cloud infrastructure.As a Staff Product Manager, you will play a key role in building the next generation AI cloud platform – a highly available, global,...
-
AI Research Infrastructure Engineer
3 days ago
San Francisco, California, United States Block Full timeEach of our brands unlocks different aspects of the economy for more people. Square makes commerce and financial services accessible to sellers. Cash App is the easy way to spend, send, and store money. Afterpay is transforming the way customers manage their spending over time. TIDAL is a music platform that empowers artists to thrive as entrepreneurs....
-
Founding Senior Infrastructure Engineer
1 week ago
San Francisco, California, United States Retell AI Full timeAbout Retell AiRetell AI is using the first principles to reimagine the call center with cutting edge voice AI.We believe voice is still the most natural way humans communicate, yet it has been trapped in outdated call centers for decades. Our mission is to bring intelligence, empathy, and speed to every phone conversation between businesses and their...
-
AI Engineer
1 week ago
San Francisco, California, United States Autospark AI Full timeCompany DescriptionAutospark AI develops AI as a Service (AIaaS) solutions that enable small and medium-sized businesses to harness the power of advanced multi-agent AI systems. Our technology supports growth, optimizes marketing efforts, and improves operational efficiencies for clients. We are committed to making AI accessible and impactful for businesses...