Sr/Staff Site Reliability Engineer, Consumer Apps
4 hours ago
About Attain
Built for consumers and companies, alike
Klover's engineering team powers one of the fastest-growing fintech platforms in the U.S., supporting over one million active users each month. Our systems process and move more than $1.5 billion annually, enabling real-time access to financial tools, rewards, and services that help people improve their day-to-day lives.
As part of this team, you'll help design, build, and scale the systems that underpin Klover's core products and platform. You'll work on high-impact, production-grade systems that prioritize reliability, security, and performance, and that integrate with a broad ecosystem of internal and external services. The work you do will directly shape how users interact with Klover's products, access their money, and experience transparent, low-fee financial services.
Klover engineers collaborate closely with colleagues across backend, frontend, data science, and product teams to deliver scalable, high-quality solutions for a rapidly growing user base. You'll have the opportunity to work with modern technologies and architectures while helping define and evolve the next generation of inclusive, data-powered financial products—building systems and interfaces that emphasize reliability, privacy, and performance at scale.
About the RoleAs a Senior/Staff Site Reliability Engineer, you will play a critical role in building out and maintaining the infrastructure that powers all of our systems, as well as all of the supporting tools to ensure that those systems are running smoothly. You will work closely with nearly every engineering team at Attain, in helping to ensure that our systems are operating at peak efficiency, and preparing us to handle the scale of our future growth.
Attain Office Hybrid Schedule (where applicable):
- Redwood City, CA: Mondays (in-office for stand-ups, all-hands) and choice of three days between Tues-Friday
- Chicago, IL & New York, NY: 4 days in-office; 1 day remote
- Write Terraform modules for deploying infrastructure resources via our GitLab pipelines
- Develop Helm charts for deploying services and jobs in our Kubernetes cluster
- Define metrics, network policies, and routing rules for our Istio service mesh
- Monitor and maintain our GCP BigQuery and Spanner databases
- Pipe metrics to our Google-managed Prometheus instance and build out Grafana dashboards and alerts to increase visibility on our systems
- Experiment with GCP offerings, 3rd party vendors, and open-source tools to further automate and secure day-to-day operations
- Leverage latest and greatest LLM models in developing infrastructure and tooling
- Pair with engineering leads to instrument and monitor critical functionality
- Add automation to both existing and new systems to reduce our reliance on manual processes
- Participate in architecture design and capacity planning discussions to ensure that our systems are scalable, maintainable, reliable, and secure
- Build, maintain, and improve our CI/CD pipeline
- You are comfortable wearing many hats
- You have a willingness to learn and teach in a fast-paced, collaborative environment
- You have a strong desire to automate things
- You readily provide constructive feedback, and also proactively seek feedback to improve yourself
- You like to get your hands dirty and tinker with/stress test new technologies
- 6+ years of experience building and maintaining large-scale cloud-native infrastructure (AWS and/or GCP)
- Experience working with the containerization technologies Docker, Kubernetes, and Istio or a similar service mesh technology
- Experience with SQL database technologies such as MySQL,Google BigQuery, and Google Spanner
- Experience with stream technologies such as Kafka and Amazon Kinesis
- Experience with pub sub technologies such as AWS SNS and Google Pub/Sub
- Experience with serverless computing technologies such as AWS Lambda and Google Cloud Functions/Google Cloud Run
- Experience with infrastructure-as-code tools such as Terraform
- Experience with observability tools such as Datadog, Prometheus, and Grafana
- Strong computer science and software engineering fundamentals
- Experience with SOC2 Compliance processes and requirements
We are excited to hear from you.
At Attain, we are passionate about finding people to continuously help us grow our organization. We encourage you to apply, even if your experience doesn't match every detail on the job description. If we don't see something that immediately fits, we will keep your resume on file for future opportunities.
-
Site Reliability Engineer
4 hours ago
Redwood City, California, United States Attain Full timeAbout AttainBuilt for consumers and companies, alikeIn a world driven by data, we believe consumers and businesses can coexist. Our founders had a vision to empower consumers to leverage their greatest asset—their data—in exchange for modern financial services. Built with this vision in mind, our platform allows consumers to access savings tools, earned...
-
Sr. Software Engineer, Site Reliability
7 days ago
Redwood City, California, United States Poshmark Full timeConfidence can sometimes hold us back from applying for a job. Here's a secret: there's no such thing as a "perfect" candidate. Poshmark is looking for exceptional people who want to make a positive impact through their work and help create an organization where everyone can thrive. So whatever background you bring with you, please apply if this role would...
-
Staff Full Stack Engineer, Web App
4 hours ago
Redwood City, California, United States Terra AI Full timeAbout Terra AIWe are building the state-of-the-art AI platform for the discovery and development of clean energy and mineral resources. We bring the most advanced techniques in generative AI, foundation modeling, and autonomous decision optimization to tackle the most important problems in the geosciences. These systems can help more reliably identify...
-
Staff Full Stack Engineer, Web App
3 hours ago
Redwood City, California, United States Terra AI Full timeAbout Terra AIWe are building the state-of-the-art AI platform for the discovery and development of clean energy and mineral resources. We bring the most advanced techniques in generative AI, foundation modeling, and autonomous decision optimization to tackle the most important problems in the geosciences. These systems can help more reliably identify...
-
Senior/Lead Site Reliability Engineer – Federal
4 hours ago
Redwood City, California, United States C3 AI Full timeC3 AI (NYSE: AI), is the Enterprise AI application software company. C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing, deploying, and operating enterprise AI applications, C3 AI applications, a portfolio of industry-specific SaaS enterprise AI applications that enable the digital...
-
Sr. Site Reliability Engineer
5 hours ago
Universal City, California, United States NBCUniversal Full timeNBCUniversal is one of the world's leading media and entertainment companies. We create world-class content, which we distribute across our portfolio of film, television, and streaming, and bring to life through our global theme park destinations, consumer products, and experiences. We own and operate leading entertainment and news brands, including NBC, NBC...
-
Sr. Site Reliability Engineer
4 hours ago
Universal City, California, United States NBCUniversal Full time $130,000 - $160,000Company Description NBCUniversal is one of the world's leading media and entertainment companies. We create world-class content, which we distribute across our portfolio of film, television, and streaming, and bring to life through our theme parks and consumer experiences. We own and operate leading entertainment and news brands, including NBC, NBC News,...
-
Site Reliability Developer 6
18 minutes ago
Redwood City, California, United States Oracle Full timeJob DescriptionExecutive Summary: SPRE Architect Role RequirementsOracle is seeking a Strategic Platform Reliability Engineering (SPRE) Architect to strengthen the architectural foundation and operational resilience of key SaaS offerings, ensuring availability, security, and compliance for top-tier customers. The SPRE Architect will lead cross-functional...
-
Staff Software Engineer, Android
6 hours ago
Redwood City, California, United States Poshmark Full timeAbout PoshmarkPoshmark is a leading fashion resale marketplace powered by a vibrant, highly engaged community of buyers and sellers and real-time social experiences. Designed to make online selling fun, more social and easier than ever, Poshmark empowers its sellers to turn their closet into a thriving business and share their style with the world. Since its...
-
Staff Software Engineer, AI Platform
3 hours ago
Redwood City, California, United States Character Full timeAbout the RoleAs a Staff Software Engineer - AI Platform at Character, you'll have the opportunity to work on a diverse and exciting set of projects spanning both software engineering and data for AI domains. Your initial focus will be on three key areas:Support the research team by building the tooling and datasets that help us train the best modelsActivate...