Site Reliability Engineer

3 days ago

San Francisco, California, United States Air Apps Full time $150,000 - $250,000 per year

About Air Apps

At Air Apps, we believe in thinking bigger—and moving faster. We're a family-founded company on a mission to create the world's first AI-powered Personal & Entrepreneurial Resource Planner (PRP), and we need your passion and ambition to help us change how people plan, work, and live. Born in Lisbon, Portugal, in 2018—and now with offices in both Lisbon and San Francisco—we've remained self-funded while reaching over 100 million downloads worldwide.

Our long-term focus drives us to challenge the status quo every day, pushing the boundaries of AI-driven solutions that truly make a difference. Here, you'll be a creative force, shaping products that empower people across the globe.

Join us on this journey to redefine resource management—and change lives along the way.

The Role

As a Site Reliability Engineer (SRE) at Air Apps, you will be responsible for ensuring the reliability, availability, and scalability of our systems. You will work at the intersection of software development and operations, implementing automation, monitoring, and performance optimization strategies to minimize downtime and improve system resilience.

Responsibilities

Design and implement scalable, reliable, and fault-tolerant systems across cloud environments.
Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK).
Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
Optimize system performance, scalability, and incident response workflows to improve uptime.
Work closely with development and DevOps teams to improve system design for reliability.
Conduct root cause analysis (RCA) and implement preventative measures to minimize failures.
Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies.
Improve CI/CD pipelines to enhance deployment speed while maintaining stability.
Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP).
Participate in on-call rotations to quickly address system failures and minimize downtime.

Requirements

Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering.
Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures.
Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic).
Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi.
Hands-on experience with containerization and orchestration (Docker, Kubernetes, Helm).
Strong Linux system administration and networking fundamentals.
Experience with incident management, debugging, and root cause analysis.
Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring.
Knowledge of load balancing, failover strategies, and distributed systems.
Understanding of security best practices, access control, and compliance requirements.
Strong communication skills and the ability to collaborate with cross-functional teams.

What benefits are we offering?

Apple hardware ecosystem for work.
Annual Bonus.
Medical Insurance (including vision & dental).
Disability insurance - short and long-term.
401k up to 4% contribution.
Air Conference – an opportunity to meet the team, collaborate, and grow together.
Transportation budget
Free meals at the hub
Gym membership

Diversity & Inclusion

At Air Apps, we are committed to fostering a diverse, inclusive, and equitable workplace. We enthusiastically welcome applicants from all backgrounds, experiences, and perspectives. We celebrate diversity in all its forms and believe that varied voices and experiences make us stronger.

Application Disclaimer

At Air Apps, we value transparency and integrity in our hiring process. Applicants must submit their own work without any AI-generated assistance. Any use of AI in application materials, assessments, or interviews will result in disqualification.

Senior Site Reliability Engineer

2 days ago

San Francisco, California, United States Sibitalent Corp Full time $180,000 - $250,000 per year

Job Title: Staff Site Reliability Engineer (SRE)Location: San Francisco, CA (Hybrid, Local Only)Duration: 6+ months Contract12+ Years of profileW2 OR C2C (Either will work)Job Description:As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include:As a Staff SRE, you'll operate at the...
Staff Site Reliability Engineer

3 days ago

San Francisco, California, United States Heartflow Full time $185,750 - $250,922 per year

Heartflow is a medical technology company advancing the diagnosis and management of coronary artery disease, the #1 cause of death worldwide, using cutting-edge technology. The flagship product—an AI-driven, non-invasive cardiac test supported by the ACC/AHA Chest Pain Guidelines called the Heartflow FFRCT Analysis—provides a color-coded, 3D model of a...
Principal Site Reliability Engineer

4 days ago

San Francisco, California, United States Harrison Clarke Full time $120,000 - $180,000 per year

Harrison Clarke are working with several high profile companies that are seeking aPrincipal Site Reliability Engineer (SRE), to lead the design, implementation, and scaling of the infrastructure and systems that support their products.The ideal candidate should have extensive experience in designing highly scalable infrastructure, building systems, and...
Founding Site Reliability Engineer

3 days ago

San Francisco, California, United States Reducto Full time $120,000 - $180,000 per year

About ReductoReducto helps AI teams ingest real world enterprise data with state of the art accuracy.The vast majority of enterprise data — from financial statements to health records — is locked in unstructured file formats like PDFs and spreadsheets. We train vision models to read those documents the way a human would, and make it possible to build...
Site Reliability Engineer

1 day ago

San Diego, California, United States SPECTRAFORCE Full time $120,000 - $180,000 per year

Role: Site Reliability Engineer (Only on W2)Location: San Diego, CA - OnsiteDuration: 12 MonthsJob Description:The Site Reliability Engineer (SRE) will work closely with cross-functional teams, including software development, platform, and operations, to support the availability and performance of our cloud-based systems. You will take ownership of the cloud...
Software Engineer, Protected Data Site Reliability Engineering

3 days ago

San Francisco, California, United States Google Full time $141,000 - $202,000

Minimum qualifications:Bachelor's degree in Computer Science, a related field, or equivalent practical experience.2 years of experience with software development in one or more programming languages.Preferred qualifications:Master's degree in Computer Science or Engineering. 2 years of experience designing, analyzing, and troubleshooting distributed...
Site Reliability Engineer

5 hours ago

San Diego, California, United States A-Line Staffing Solutions Full time $144,000 per year

Site Reliability Engineer (SRE)Location:San Diego, CA (Hybrid)Rate:$70–80/hr on W-2 (No C2C)OverviewWe are seeking an experiencedSite Reliability Engineer (SRE)to join our cross-functional team supporting cloud-based systems in a regulated healthcare environment. This role is ideal for an engineer who thrives on automation, scalability, observability, and...
Software Engineer, Protected Data Site Reliability Engineering

2 days ago

San Francisco, California, United States Google Full time $141,000 - $202,000 per year

Applicants in San Francisco: Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Francisco Fair Chance Ordinance for Employers and the California Fair Chance Act.Minimum qualifications:Bachelor's degree in Computer Science, a related field, or equivalent practical experience.2 years of...
Staff Site Reliability Engineer, Network

4 days ago

San Francisco, California, United States Crusoe Full time $204,000 - $247,000

Crusoe's mission is to accelerate the abundance of energy and intelligence. We're crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact,...
Senior Staff Site Reliability Engineer

3 days ago

San Francisco, California, United States Quizlet Full time $258,000 - $314,000 per year

About Quizlet:At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way. Our $1B+ learning platform serves tens of millions of students every month, including two-thirds of U.S. high schoolers and half of U.S. college students, powering over 2 billion learning interactions monthly.We blend cognitive...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer