Senior Software Engineer, Site Reliability Tooling

2 weeks ago


Remote, United States Upstart Full time

About UpstartUpstart is the leading AI lending marketplace partnering with banks and credit unions to expand access to affordable credit. By leveraging Upstart's AI marketplace, Upstart-powered banks and credit unions can have higher approval rates and lower loss rates across races, ages, and genders, while simultaneously delivering the exceptional digital-first lending experience their customers demand. More than 80% of borrowers are approved instantly, with zero documentation to upload. Upstart is a digital-first company, which means that most Upstarters live and work anywhere in the United States. However, we also have offices in San Mateo, California; Columbus, Ohio; and Austin, Texas. Most Upstarters join us because they connect with our mission of enabling access to effortless credit based on true risk. If you are energized by the impact you can make at Upstart, we’d love to hear from youThe Team Upstart’s Site Reliability Engineering (SRE) team owns the reliability, resiliency, and observability of Upstart’s production systems.  The SRE team builds tooling and automation to monitor the health of our infrastructure and create a fast, reliable, and productive environment for other engineers and a world-class experience for our customers. SRE defines Upstart’s strategy for technology operations risk mitigation, which includes disaster planning and on-call procedures. We use data-driven approaches to drive our decisions, and provide reports and insights to the business to improve visibility into the system and customer experience. As a Senior Software Engineer focused on Site Reliability Tooling your work will directly impact the success of the SRE team and all of Upstart. Your expertise will inform the team’s direction, and your work with other SREs and Upstart engineers will make Upstart’s systems as effective as possible for our customers.  SRE at Upstart is ever-changing, and you will be a primary contributor in shaping our future path. How you’ll make an impact: Embody and share SRE principles at Upstart Exercise state-of-the-art SRE practices throughout the company Uphold a culture of visibility, ownership, and responsibility around service reliability Implement standards for monitoring microservices, web apps, mobile apps, databases, Kubernetes clusters, and machine learning platforms, in a fast-paced environment Improve incident response practices, both within SRE and throughout the company Automate away toil that make sense to be automated What we’re looking for:  Minimum requirements: Minimum of 6 years combined experience between Software Engineering, Site Reliability, and/or DevOps Engineering including CI/CD, TDD, internal tooling, observability, and other agile development practices Proficiency coding Python, Go, JavaScript/TypeScript  Proficiency with Infrastructure as Code (Terraform, CDK, Cloudformation, etc.) Software engineering background with experience building internal tooling from scratch, and other agile development techniques Strong software design & architecture skills Fundamentally sound with data structures & algorithms  Experience with on-call and incident management environments Experience with observability, monitoring, and reporting tools (e.g., Datadog, Prometheus, etc.) Experience supporting SaaS software in a microservice-oriented cloud environment Ability to work with multiple teams for enterprise-wide deliverables Data/metrics-driven mindset Preferred qualifications: Experience with service mesh Full Stack development skills  Experience building tooling for an observability platform Experience leveraging LLM/GenAI to improve SRE efficiency and processes     Position Location - This role is available in the following locations: Remote, San Mateo, Columbus, Austin  Time Zone Requirements - This team operates across all U.S. time zones. Travel Requirements - This team has regular on-site collaboration sessions. These occur 3 days per quarter at an Upstart office. If you need to travel to make these meetups, Upstart will cover all travel related expenses.   What you'll love:  Competitive Compensation (base + bonus & equity) Comprehensive medical, dental, and vision coverage with Health Savings Account contributions from Upstart  401(k) with 100% company match up to $4,500 and immediate vesting and after-tax savings Employee Stock Purchase Plan (ESPP) Life and disability insurance Generous holiday, vacation, sick and safety leave   Supportive parental, family care, and military leave programs Annual wellness, technology & ergonomic reimbursement programs Social activities including team events and onsites, all-company updates, employee resource groups (ERGs), and other interest groups such as book clubs, fitness, investing, and volunteering Catered lunches + snacks & drinks when working in offices   #LI-REMOTE #LI-MidSeniorAt Upstart, your base pay is one part of your total compensation package.  The anticipated base salary for this position is expected to be within the below range. Your actual base pay will depend on your geographic location–with our “digital first” philosophy, Upstart uses compensation regions that vary depending on location. Individual pay is also determined by job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process. In addition, Upstart provides employees with target bonuses, equity compensation, and generous benefits packages (including medical, dental, vision, and 401k).United States | Remote - Anticipated Base Salary Range$163,600—$226,400 USD



  • Remote, Oregon, United States Shutterfly Full time $106,000 - $151,000 per year

    At Shutterfly, we make life's experiences unforgettable. We believe there is extraordinary power in the self-expression. That's why our family of brands helps customers create products and capture moments that reflect who they uniquely are.Shutterfly is looking for a Senior Site Reliability Engineer to join our team. Shutterfly is undergoing a comprehensive...


  • Remote, Oregon, United States Veeam Software Full time

    Veeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep...


  • Remote, Oregon, United States D-Wave Full time $124,545 per year

    D-Wave (NYSE: QBTS), D-Wave is a leader in the development and delivery of quantum computing systems, software, and services. We are the world's first commercial supplier of quantum computers, and the only company building both annealing and gate-model quantum computers. Our mission is to help customers realize the value of quantum, today. Our quantum...


  • Remote, Oregon, United States ADT Full time $200,000 - $250,000 per year

    ADT is transitioning to an in-office model. New team members will work from home but should plan to return to an in-office model at a later date. We will keep you well informed and supported throughout the transition.Summary:We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for...


  • remote, us Epam Full time

    Description DESCRIPTION Are you a seasoned professional with a passion for site reliability engineering and a knack for leading strategic initiatives? Join our dynamic team at EPAM, a leading global provider of digital platform engineering and software development services. We are seeking a Senior Site Reliability Engineer who can make a significant impact...


  • Remote, United States Grafana Labs Full time

    Senior Site Reliability Engineer - DatabasesThis is a remote position and we're considering candidates in the USA & Canada.About the role:We are looking for a Senior SRE to help us support our highest value Grafana Cloud customers by increasing the reliability of our Cloud databases that are based on Mimir, Loki, Tempo, and Pyroscope. We provide these...


  • remote (within united states) Curology Full time

    Mission of the Role: Architect and lead the delivery of high-quality and reliable solutions through creative problem-solving and technical expertise to address our business problems on a frequent and regular cadence. Write software to automate and scale the operations of our engineering organization. Evangelize reliability-as-a-feature through monitoring,...


  • remote, us Epam Full time

    Description DESCRIPTION Join EPAM as a Senior Site Reliability Engineer specializing in AWS! In this role, you'll ensure fleet services reliability and availability under the SRE model. If you have a good track record of highly scalable, distributed systems projects and previous experience working as an SRE, we'd love to hear from you. EPAM is a leading...


  • Remote, Oregon, United States AlphaSense Full time

    About AlphaSense: The world's most sophisticated companies rely on AlphaSense to remove uncertainty from decision-making. With market intelligence and search built on proven AI, AlphaSense delivers insights that matter from content you can trust. Our universe of public and private content includes equity research, company filings, event transcripts, expert...


  • Remote, Oregon, United States JWay Group Full time

    Sr. Site Reliability Engineer, Stack ManagementAs a Site Reliability Engineer, you will be responsible for architecting, maintaining, and managing our client's infrastructure which includes solving some of the most challenging cloud access and data security problems for enterprise customers.Job ResponsibilitiesMaintain and support existing IT infrastructure...