Site Reliability Engineering Manager

1 week ago


San Diego, United States Art of Problem Solving Full time
Job DescriptionJob Description

As the Site Reliability Engineering Manager, you'll play a key role in supporting and scaling the technology that helps us discover, inspire, and train the great problem solvers of the next generation. In this position, you will lead our cloud modernization efforts and maintain existing infrastructure across all of our products and services, supporting a growing user base currently numbering around one million. This position is ideal for a detail-oriented and strategic engineering leader who will set and execute our cloud infrastructure strategy alongside their team of two Site Reliability Engineers. This is a hybrid full-time position based at our headquarters in San Diego, CA.

The Site Reliability Engineering Manager:

  • Manages a team of Site Reliability Engineers, including hiring, evaluating, training, and developing their team members as well building a collaborative and productive team culture.
  • Owns and maintains company cloud infrastructure strategy and SRE team roadmap.
  • Implements/evaluates reliability metrics for our products and services, and advocates for projects to reduce our exposure to or better understand reliability risks.
  • Runs, evaluates, and improves SRE processes and procedures including task workflow, reviews, launches, etc., including managing regular team responsibilities and leading the maintenance of team documentation.
  • Provides technical expertise by collaborating with stakeholders to make high-level decisions related to their team, providing technical direction to team members, and being a knowledge base of information for their team.
  • Allocates team resources by mapping team members to tasks and projects, helping estimate time for their team members to complete projects, and advocating for engineering resources as needed.
  • Drives continuous improvement in the SRE space and the broader Engineering Department by proposing and advocating for projects that will improve reliability, security, and/or maintainability, improve development workflow, remove operational bottlenecks, or otherwise improve engineering department bandwidth.
  • Is accountable for the overall risk management and reduction practices and contributes to risk management practices in other engineering teams.
  • Communicates cross-team by being the main point of contact between the SRE team and other engineering teams, and between their team and company stakeholders. Facilitates connections between their team members and other teams, and regularly works with engineering managers, engineering team leads, project managers.
  • Performs all the duties of a Site Reliability Engineer.

The ideal candidate has:

  • Expert-level experience planning, designing, implementing, securing, and monitoring scalable infrastructure for web applications in the AWS ecosystem
  • Experience leading technical strategy and execution in projects
  • Experience deploying and managing Infrastructure-as-Code with Terraform
  • Familiarity with Node.js (preferred) and/or PHP
  • Familiarity with MariaDB, PostgreSQL, Redis, Apache, and nginx or similar technologies.
  • Prior full-stack or backend software engineering experience is preferred
  • Prior people management experience, especially in an SRE or DevOps role, is preferred

Why Join AoPS:

The full salary range for this position is 154k-187k with a 6% year-end bonus. Here are some things you can look forward to:

  • Impact: The opportunity to drive the reliability and scalability of our infrastructure, supporting our growing number of customers
  • Culture: Work and collaborate with an organization filled with builders and life-long learners who strive to discover, inspire, and train the great problem solvers of the next generation
  • Flexibility: Casual work environment with a hybrid work week and flexible scheduling
  • Benefits: Multiple options for Medical, Dental and Vision plans
  • Future Planning: 401K with company match
  • Quality of Life: PTO Plan and supportive leadership that gives you the work-life balance you deserve
  • Ease of Transition: Relocation bonus (if currently located outside of San Diego)

Background Check:

Please note that employment is contingent on the successful completion of a background check.

About AoPS:

Art of Problem Solving (AoPS) is on a mission to discover, inspire, and train the great problem solvers of the next generation. Since 2003, we have trained hundreds of thousands of the country's top students, including nearly all the members of the US International Math Olympiad team, through our online school, in-person academies, textbooks, and online learning systems. While our primary focus has been math for most of our history, through the years we have expanded our unique problem solving curriculum into more subjects, such as language arts, science, and computer science.

#LI-JS1

By clicking submit application you agree that we may contact you regarding your application via email, phone or SMS and to the terms of our data privacy policy.



  • San Diego, United States Art of Problem Solving Full time

    Job DescriptionJob DescriptionAs a Senior Site Reliability Engineer, you will play a critical role in enhancing and maintaining the resilience of our cloud-based infrastructure and services. You will leverage your deep technical expertise to ensure our systems are scalable, reliable, efficient, and secure, supporting our mission to discover, inspire, and...


  • San Diego, United States ACL Digital Full time

    W2 Contract/ Local candidates only Job Title: Site Reliability Engineer Location: San Diego, CA (Open to other locations in California) Is this the role you are looking for If so read on for more details, and make sure to apply today. Job Description: It is an exciting time to be part of SIE’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs...


  • San Diego, United States ACL Digital Full time

    W2 Contract/ Local candidates onlyJob Title: Site Reliability EngineerLocation: San Diego, CA (Open to other locations in California)Job Description:It is an exciting time to be part of SIE’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team...

  • Senior Site Reliability Engineer

    Found in: Appcast Linkedin GBL C2 - 2 weeks ago


    San Diego, United States ACL Digital Full time

    W2 Contract/ Local candidates onlyJob Title: Site Reliability EngineerLocation: San Diego, CA (Open to other locations in California)Job Description:It is an exciting time to be part of SIE’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team...

  • Senior Site Reliability Engineer

    Found in: Appcast US C2 - 2 weeks ago


    San Diego, United States ACL Digital Full time

    W2 Contract/ Local candidates onlyJob Title: Site Reliability EngineerLocation: San Diego, CA (Open to other locations in California)Job Description:It is an exciting time to be part of SIE’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team...


  • San Diego, United States ACL Digital Full time

    W2 Contract/ Local candidates only Job Title: Site Reliability Engineer Location: San Diego, CA (Open to other locations in California) Job Description: It is an exciting time to be part of SIEs CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team...

  • Site Reliability Engineer

    Found in: Talent US C2 - 1 week ago


    San Diego, United States PEAK Technical Staffing USA Full time

    Hiring Senior Site Reliability Engineer;primary responsibilities will include contributing to the implementation and delivery of the end-to-end automation platform, to support continuous integration and continuous delivery (CI/CD), with a focus on developer self-service capabilities. NOTE: Must have build out experience with Kubernetes.This position...


  • San Diego, United States PEAK Technical Staffing USA Full time

    Hiring Senior Site Reliability Engineer; primary responsibilities will include contributing to the implementation and delivery of the end-to-end automation platform, to support continuous integration and continuous delivery (CI/CD), with a focus on developer self-service capabilities. NOTE: Must have build out experience with Kubernetes. This position...


  • San Francisco, United States Illuminate Literacy Full time

    Job DescriptionJob DescriptionAs the Site Reliability Engineer at Illuminate Literacy, you will serve a critical role in our mission to eradicate illiteracy. You will lead and oversee our production environment's reliability, security, and quality assurance. This role involves managing a multifaceted team responsible for operational health, security...

  • Sr. Site Reliability Engineer

    Found in: Jooble US O C2 - 2 weeks ago


    San Francisco, CA, United States hims & hers Full time

    About the Role: We are seeking a Site Reliability Engineer to help build a reliable web experience for our users. We believe that moving fast is our competitive advantage, and enables us to better serve our users. We also know that the faster we move, the more likely we are to break things. You Will: Design and implement SRE practices ensuring...

  • Reliability Engineer

    Found in: beBee jobs US - 3 weeks ago


    San Diego, California, United States ATR International Full time

    Job Description:We are seeking a Reliability Engineer for a very important client Job Overview - Principal Duties and ResponsibilitiesSuccessful candidate will be tasked for Product, Package reliability test tracking; reliability database, data analysis and summarization on a regular basis for commercial, industrial and/or automotive application Product...


  • San Diego, United States Spectraforce Technologies Full time

    Job Title: Reliability Engineer Duration: 6 Months Location: Onsite(San Diego, CA) Job Overview - Principal Duties and Responsibilities Successful candidate will be tasked for Product, Package reliability test tracking; reliability database, data analysis and summarization on a regular basis for commercial, industrial and/or automotive application Product...

  • Site Reliability Engineer P-051

    Found in: Jooble US O C2 - 2 weeks ago


    San Jose, CA, United States Smash CR Full time

    The role As a Site Reliability Engineer (SRE) at company, your mandate is to ensure the availability and reliability of our most critical services, and ensure that they meet the requirements of our customers. Our SRE team is growing, so you’ll be a crucial early member to help establish the team, processes, and best practices. Success in this role looks...


  • San Mateo, United States Zoox Full time

    Zoox is looking for a site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous vehicles. In this role, you will be heavily involved in all phases of rolling out a service from designing systems that are easy to maintain and fault-tolerant through...

  • Senior Site Reliability Engineer

    Found in: Jooble US O C2 - 2 weeks ago


    San Francisco, CA, United States Fieldguide Full time

    Full Time] Senior Site Reliability Engineer at Fieldguide (United States) | BEAMSTART Jobs Senior Site Reliability Engineer Full Time Remote Work Stock Options Fieldguide is tackling complex security and compliance data problems. We ingest and break down large data sets, power and automate the validation and verification of data, and help...

  • Senior Site Reliability Engineer

    Found in: Appcast Linkedin GBL C2 - 3 days ago


    San Jose, United States HireIO, Inc. Full time

    Job Description Position Description: Location: Usa/Usa/California/Sf Bay Area, Seattle Base Salary: 187K - 280KSponsor Visa? YesLanguage Requirements: English, Mandarin (Preferred)Our Team:Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our...

  • Site Reliability Engineer

    Found in: beBee jobs US - 1 week ago


    San Francisco, California, United States Midpoint Labs Full time

    At Midpoint Labs, our mission is to build the portal between the rich ecosystem of Web 2 endpoints and the emerging landscape of Web 3 applications. Blockchain smart contracts are inherently restricted to communicating exclusively with other smart contracts - preventing developers from building applications beyond moving and exchanging crypto assets. True...

  • Site Reliability Engineer P-051

    Found in: Jooble US O C2 - 2 weeks ago


    San Jose, CA, United States Smash CR Full time

    The role As a Site Reliability Engineer (SRE) at company, your mandate is to ensure the availability and reliability of our most critical services, and ensure that they meet the requirements of our customers. Our SRE team is growing, so you’ll be a crucial early member to help establish the team, processes, and best practices. Success in this role looks...


  • San Diego, United States ACL Digital Full time

    Job Title: Reliability Development Engineer Location: San Diego, CA (Onsite) Duration: Contract Project Job Description: Job Overview - Principal Duties and Responsibilities: Successful candidate will be tasked for Product, Package reliability test tracking; reliability database, data analysis and summarization on a regular basis for commercial, industrial...


  • San Diego, United States ACL Digital Full time

    Job Title: Reliability Development Engineer Location: San Diego, CA (Onsite) Duration: Contract Project Maximise your chances of a successful application to this job by ensuring your CV and skills are a good match. Job Description: Job Overview - Principal Duties and Responsibilities: Successful candidate will be tasked for Product, Package reliability...