Founding Site Reliability Engineer
15 hours ago
About Us
At Relevance AI, we're building the home of the AI workforce.
Our mission is simple: empower every team to delegate meaningful work to AI agents that think, act, and collaborate like experts.
With Relevance AI, anyone can create and manage intelligent agents that handle workflows, decisions, and collaboration - all within one unified platform. Our technology already powers industry leaders such as Canva, Databricks, Confluent, Autodesk, Lightspeed, Rakuten, Aveva, Qualified, and Activision Blizzard, helping them scale excellence across operations, marketing, and sales.
We're backed by Bessemer Venture Partners, Insight Partners, Peak XV, and King River Capital, and raised our Series B in April 2025 to accelerate growth and push the boundaries of agentic automation.
Headquartered in San Francisco and Sydney, we operate on a hybrid model and thrive on curiosity, collaboration, and execution - we move fast, think big, and win together.
This year, we were proud to be named LinkedIn's #1 Startup in Australia.
If you want to define how the world works with AI, join us.
The Role
We're looking for a Founding Site Reliability Engineer to join us as our first SRE hire in San Francisco. We are open to hiring someone who is Senior, Lead or Principal level and will be candidate led. This role is perfect for someone ready to establish and scale the SRE discipline from the ground up in one of the fastest-growing AI companies globally.
You'll own the reliability, scalability, and security of our platform as we power tens of thousands of multi-agent workloads across multiple regions. You'll partner closely with our founders, engineering leads, and product teams to define our reliability culture, shape long-term strategy, and build world-class infrastructure for enterprise scale.
What You'll Do
- Own SRE establishing best practices, tooling, and culture
- Tackle reliability challenges unique to multi-agent orchestration at enterprise scale
- Guarantee >99.9% uptime of production systems, ensuring reliability at global scale
- Architect and automate AWS infrastructure with Terraform and CI/CD pipelines
- Design observability systems across microservices, APIs, and vector infrastructure (metrics, tracing, logging)
- Drive down incidents and MTTR through runbooks, alerting, and incident response excellence
- Help scale infra to support hundreds of thousands of agents and billions of API calls
- Partner with engineering teams to embed SRE principles into the SDLC and shape org-wide reliability strategy
- Act as a founding voice in our SF office, influencing product direction and engineering culture
- 5+ years in SRE/DevOps/Infrastructure roles, with experience in enterprise SaaS environments.
- Deep AWS expertise (EC2, ECS/EKS, Lambda, RDS, VPC, IAM).
- Proven track record with Infrastructure as Code (Terraform, Kubernetes/EKS, CDK, or CloudFormation).
- Hands-on with observability stacks (CloudWatch, Grafana, Prometheus, Datadog).
- Incident management experience in production SaaS systems, including on-call, postmortems, and reliability improvements.
- Bonus: Prior exposure to AI/ML platforms, data-heavy systems, or multi-agent workloads.
AWS, Kubernetes/EKS, Terraform, GitHub Actions, Postgres/Mongo, Prometheus/Grafana, CloudWatch, PagerDuty/BetterStack
Benefits
- Health Insurance Contribution - Relevance AI contributes to the cost of individual medical, dental, and vision insurance for employees.
- Commuter Benefits - Save on your commute with pre-tax deductions for transit and parking expenses
- Unlimited Annual Leave - Flexible time off policy to rest, recharge, and take care of what matters most
- ESOP - Employee Stock Ownership Plan so you can grow with the company
- AI Productivity Benefit - Get up to $1200 USD/year to spend on AI tools, courses, and learning resources that help you work smarter and grow your skills
- Parental Leave - We offer 12 weeks of paid parental leave for all eligible new parents, and an additional 6 weeks for the birthing parent
- Milestone Merch - Celebrate your work anniversaries with customised Relevance AI swag
- Food, Drinks & Community - Stay energised with free breakfasts, healthy snacks, and a fully stocked fridge of drinks. Enjoy team lunches provided every Thursday and Friday, plus Uber Eats dinners and regular catered office meals throughout the week. As the home of the AI workforce, we also host vibrant community events featuring thought leaders, industry partners, and the wider tech community.
- Quarterly Team Events - Build stronger connections through fun, meaningful team bonding experiences every quarter
- Social Clubs - Share your hobbies and interests by joining or starting a club with your teammates. From hiking and chess to board game nights and social committee activities-there's something for everyone
- Sonder EAP - Access 24/7 mental health and wellbeing support through Sonder, our Employee Assistance Program
-
Site Reliability Engineer
3 days ago
San Francisco, CA, United States ConductorOne Full timeConductorOne is the first AI-native identity security platform that protects every identity: human, non-human, and AI. With powerful automation, platform-level AI, and out-of-the-box connectors, it centralizes access visibility, enforces fine-grained controls, enables just-in-time access, and automates user access reviews across all apps. It's easy to use,...
-
Site Reliability Engineer
1 week ago
San Francisco, CA, United States ConductorOne Full timeConductorOne is the first AI-native identity security platform that protects every identity: human, non-human, and AI. With powerful automation, platform-level AI, and out-of-the-box connectors, it centralizes access visibility, enforces fine-grained controls, enables just-in-time access, and automates user access reviews across all apps. It's easy to use,...
-
Engineering Manager, Site Reliability
1 week ago
San Francisco, CA, United States Reddit Full timeReddit is a community of communities. It's built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 116 million daily active unique visitors, Reddit is one of...
-
Site Reliability Engineer
2 weeks ago
San Francisco, CA, United States Fractal, Inc. Full timeThis range is provided by Fractal. Your actual pay will be based on your skills and experience talk with your recruiter to learn more. Base pay range $110,000.00/yr - $160,000.00/yr Site Reliability Engineer Fractal Analytics is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building...
-
Founding Engineer
1 week ago
San Francisco, CA, United States Transparent Search Group Full timeJob Description:Founding Engineer Full Stack (Applied AI) San Francisco, CA | Full-time On-site | $160K - $260K + Equity (0.5% - 1.5%) About the Role We are redefining the future of voice-based interactions by building highly realistic, AI-powered voice agents that enhance customer support, order intake, and lead qualification across industries. Backed by...
-
Senior+ Site Reliability Engineer
3 days ago
San Francisco, CA, United States Crusoe Full timeCrusoe's mission is to accelerate the abundance of energy and intelligence. We're crafting the engine that powers a world where people can create ambitiously with AI - without sacrificing scale, speed, or sustainability. Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and...
-
Senior+ Site Reliability Engineer
15 hours ago
San Francisco, CA, United States Crusoe Full timeCrusoe's mission is to accelerate the abundance of energy and intelligence. We're crafting the engine that powers a world where people can create ambitiously with AI - without sacrificing scale, speed, or sustainability. Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and...
-
Founding AI Engineer
2 weeks ago
San Francisco, CA, United States Human Capital Solutions Full timeAbout the job Founding AI Engineer Job Title: Founding AI Engineer Location: San Francisco, CA (On-site, 5 days/week) Compensation: $120,000 $180,000 base salary + 0.4% 1.4% equity ***Note: US based candidates onlyAbout the Role We are looking for a Founding AI Engineer who thrives on solving complex challenges. In this role, you will take ownership of...
-
Site Reliability Engineer
2 weeks ago
San Francisco, CA, United States SS&C Technologies Full timeAs a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial services and healthcare organizations, from the world's largest companies to small and mid-market firms, rely on SS&C for expertise, scale, and technology. Job...
-
Site Reliability Engineer
2 weeks ago
San Francisco, CA, United States SS&C Technologies Full timeAs a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial services and healthcare organizations, from the world's largest companies to small and mid-market firms, rely on SS&C for expertise, scale, and technology. Job...