Sr/Staff Site Reliability Engineer, Consumer Apps

4 hours ago

Redwood City, California, United States Attain Full time

About Attain

Built for consumers and companies, alike

Klover's engineering team powers one of the fastest-growing fintech platforms in the U.S., supporting over one million active users each month. Our systems process and move more than $1.5 billion annually, enabling real-time access to financial tools, rewards, and services that help people improve their day-to-day lives.

As part of this team, you'll help design, build, and scale the systems that underpin Klover's core products and platform. You'll work on high-impact, production-grade systems that prioritize reliability, security, and performance, and that integrate with a broad ecosystem of internal and external services. The work you do will directly shape how users interact with Klover's products, access their money, and experience transparent, low-fee financial services.

Klover engineers collaborate closely with colleagues across backend, frontend, data science, and product teams to deliver scalable, high-quality solutions for a rapidly growing user base. You'll have the opportunity to work with modern technologies and architectures while helping define and evolve the next generation of inclusive, data-powered financial products—building systems and interfaces that emphasize reliability, privacy, and performance at scale.

About the Role

As a Senior/Staff Site Reliability Engineer, you will play a critical role in building out and maintaining the infrastructure that powers all of our systems, as well as all of the supporting tools to ensure that those systems are running smoothly. You will work closely with nearly every engineering team at Attain, in helping to ensure that our systems are operating at peak efficiency, and preparing us to handle the scale of our future growth.

Attain Office Hybrid Schedule (where applicable):

Redwood City, CA: Mondays (in-office for stand-ups, all-hands) and choice of three days between Tues-Friday
Chicago, IL & New York, NY: 4 days in-office; 1 day remote

What a typical week might look like

Write Terraform modules for deploying infrastructure resources via our GitLab pipelines
Develop Helm charts for deploying services and jobs in our Kubernetes cluster
Define metrics, network policies, and routing rules for our Istio service mesh
Monitor and maintain our GCP BigQuery and Spanner databases
Pipe metrics to our Google-managed Prometheus instance and build out Grafana dashboards and alerts to increase visibility on our systems
Experiment with GCP offerings, 3rd party vendors, and open-source tools to further automate and secure day-to-day operations
Leverage latest and greatest LLM models in developing infrastructure and tooling
Pair with engineering leads to instrument and monitor critical functionality
Add automation to both existing and new systems to reduce our reliance on manual processes
Participate in architecture design and capacity planning discussions to ensure that our systems are scalable, maintainable, reliable, and secure
Build, maintain, and improve our CI/CD pipeline

You'll be a great fit for the role if

You are comfortable wearing many hats
You have a willingness to learn and teach in a fast-paced, collaborative environment
You have a strong desire to automate things
You readily provide constructive feedback, and also proactively seek feedback to improve yourself
You like to get your hands dirty and tinker with/stress test new technologies

Preferred Qualifications

6+ years of experience building and maintaining large-scale cloud-native infrastructure (AWS and/or GCP)
Experience working with the containerization technologies Docker, Kubernetes, and Istio or a similar service mesh technology
Experience with SQL database technologies such as MySQL,Google BigQuery, and Google Spanner
Experience with stream technologies such as Kafka and Amazon Kinesis
Experience with pub sub technologies such as AWS SNS and Google Pub/Sub
Experience with serverless computing technologies such as AWS Lambda and Google Cloud Functions/Google Cloud Run
Experience with infrastructure-as-code tools such as Terraform
Experience with observability tools such as Datadog, Prometheus, and Grafana
Strong computer science and software engineering fundamentals
Experience with SOC2 Compliance processes and requirements

We are excited to hear from you.

At Attain, we are passionate about finding people to continuously help us grow our organization. We encourage you to apply, even if your experience doesn't match every detail on the job description. If we don't see something that immediately fits, we will keep your resume on file for future opportunities.

Site Reliability Engineer

4 hours ago

Redwood City, California, United States Attain Full time

About AttainBuilt for consumers and companies, alikeIn a world driven by data, we believe consumers and businesses can coexist. Our founders had a vision to empower consumers to leverage their greatest asset—their data—in exchange for modern financial services. Built with this vision in mind, our platform allows consumers to access savings tools, earned...
Sr. Software Engineer, Site Reliability

7 days ago

Redwood City, California, United States Poshmark Full time

Confidence can sometimes hold us back from applying for a job. Here's a secret: there's no such thing as a "perfect" candidate. Poshmark is looking for exceptional people who want to make a positive impact through their work and help create an organization where everyone can thrive. So whatever background you bring with you, please apply if this role would...
Staff Full Stack Engineer, Web App

4 hours ago

Redwood City, California, United States Terra AI Full time

About Terra AIWe are building the state-of-the-art AI platform for the discovery and development of clean energy and mineral resources. We bring the most advanced techniques in generative AI, foundation modeling, and autonomous decision optimization to tackle the most important problems in the geosciences. These systems can help more reliably identify...
Staff Full Stack Engineer, Web App

3 hours ago

Redwood City, California, United States Terra AI Full time

About Terra AIWe are building the state-of-the-art AI platform for the discovery and development of clean energy and mineral resources. We bring the most advanced techniques in generative AI, foundation modeling, and autonomous decision optimization to tackle the most important problems in the geosciences. These systems can help more reliably identify...
Senior/Lead Site Reliability Engineer – Federal

4 hours ago

Redwood City, California, United States C3 AI Full time

C3 AI (NYSE: AI), is the Enterprise AI application software company. C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing, deploying, and operating enterprise AI applications, C3 AI applications, a portfolio of industry-specific SaaS enterprise AI applications that enable the digital...
Sr. Site Reliability Engineer

5 hours ago

Universal City, California, United States NBCUniversal Full time

NBCUniversal is one of the world's leading media and entertainment companies. We create world-class content, which we distribute across our portfolio of film, television, and streaming, and bring to life through our global theme park destinations, consumer products, and experiences. We own and operate leading entertainment and news brands, including NBC, NBC...
Sr. Site Reliability Engineer

4 hours ago

Universal City, California, United States NBCUniversal Full time $130,000 - $160,000

Company Description NBCUniversal is one of the world's leading media and entertainment companies. We create world-class content, which we distribute across our portfolio of film, television, and streaming, and bring to life through our theme parks and consumer experiences. We own and operate leading entertainment and news brands, including NBC, NBC News,...
Site Reliability Developer 6

18 minutes ago

Redwood City, California, United States Oracle Full time

Job DescriptionExecutive Summary: SPRE Architect Role RequirementsOracle is seeking a Strategic Platform Reliability Engineering (SPRE) Architect to strengthen the architectural foundation and operational resilience of key SaaS offerings, ensuring availability, security, and compliance for top-tier customers. The SPRE Architect will lead cross-functional...
Staff Software Engineer, Android

6 hours ago

Redwood City, California, United States Poshmark Full time

About PoshmarkPoshmark is a leading fashion resale marketplace powered by a vibrant, highly engaged community of buyers and sellers and real-time social experiences. Designed to make online selling fun, more social and easier than ever, Poshmark empowers its sellers to turn their closet into a thriving business and share their style with the world. Since its...
Staff Software Engineer, AI Platform

3 hours ago

Redwood City, California, United States Character Full time

About the RoleAs a Staff Software Engineer - AI Platform at Character, you'll have the opportunity to work on a diverse and exciting set of projects spanning both software engineering and data for AI domains. Your initial focus will be on three key areas:Support the research team by building the tooling and datasets that help us train the best modelsActivate...

Americas

Europe

Asia / Oceania

Africa

Sr/Staff Site Reliability Engineer, Consumer Apps