Senior Site Reliability Engineer

4 weeks ago


Greater Denver Area, United States Stack Overflow Full time

Every developer has a tab open on Stack Overflow.

We are one of the most popular websites in the world - a community-based space focused on increasing productivity, decreasing cycle times, accelerating time to market, and protecting institutional knowledge.

Innovation is at the heart of everything we do. We embrace collaboration, transparency, and believe in leading with empathy; creating an environment where every Stacker knows they belong. We embrace that the unique contributions and points of view of all Stackers contribute to our success.

We are a Best Company to Work For, in addition to being recognized for Best Company Leadership, Best Company Happiness, Best Company Perks and Benefits, Best Company Work-Life Balance, Best Company Compensation, and Best Company Outlook.

We are a remote-first company with Hiring HUBs based in the US, Canada, UK, and Germany.


Stack Overflow is growing fast, and our infrastructure needs keep getting bigger as our products

scale and grow. We're looking for a Site Reliability Engineer to join our existing team of SREs and developers to help us grow our cloud infrastructure as we transition away from our on-premise footprint. As an SRE, you will collaborate with application development teams to identify gaps and opportunities to improve reliability across our products, always looking for ways to automate manual work, and create repeatable, scalable systems and processes. We want you to suggest solutions and build tools to measure and monitor the reliability for our products.

We're looking for someone with deep experience with the Google Cloud Platform, and familiarity with the .NET ecosystem. We don't expect you to know every other part of our stack coming in, so we'll pair you with other members of the team to learn and develop your skills across our entire infrastructure (including our non-cloud infrastructure). We operate in mixed Windows and Linux environments, and expect someone in this role to have experience with one environment and a working understanding of the other. Experience with either Networking/VPN, Elasticsearch, Redis, Azure, or Terraform is a plus.

What you'll work on:

  • Leverage your GCP experience to support our multi-cloud solutions as we migrate our on-premise footprint to the cloud.
  • Manage a high-quality production platform and promotion pipeline that ensures capacity for our users.
  • Reduce toil through software solutions by removing or automating manual tasks, steps, and workflows as we further streamline deployments and upgrades.
  • Improve the observability of our systems to help identify issues or bottlenecks by iterating on our monitoring and alerting strategies.
  • Improve our security patching and compliance strategy for cloud solutions.
  • Participate in our on-call rotation (typically 1 fortnight out of 4 months).
  • Partner closely with your peers to accomplish goals within an agile software development lifecycle.

Our current ecosystem includes:

  • Microsoft Azure
  • Self hosted infrastructure
  • Terraform, PowerShell, Python, Go
  • Windows Server, IIS, and .NET Core
  • Linux - CentOS
  • Our toolchain includes: GitHub, TeamCity (CI), GHA, Octopus Deploy, HAProxy / NGINX, ElasticSearch, Redis, Argo Workflows, Kubernetes, Datadog

Skills & Requirements

We're looking for:

  • Experience working in Google Cloud Platform, with a proven track record of designing and deploying modular cloud-based systems that leverage relevant GCP technology.
  • Experience with Terraform or similar IaC tools.
  • Experience writing mature software solutions in a high-level programming language (for example, but not limited to, Python, Golang, C#).
  • A track record of getting stuff done with an emphasis on mentoring and technical leadership.
  • A strong practical understanding of software development lifecycle phases, from planning and development through production deployment and monitoring.
  • Experience with Agile methodologies such as Scrum, XP, or Kanban.
  • Willingness to learn new technologies and adapt to changing priorities.
  • Eagerness and ability to work with different types of functional groups, share knowledge, collaborate, and contribute. This is particularly important given our remote first environment.

We like to see:

  • Expertise in scripting languages (Bash, Powershell)
  • SQL experience (Microsoft SQL Server a plus)
  • An understanding of service level indicators and service level objectives

What you'll get in return:

  • Competitive Base Salary
  • Generous paid vacation
  • Generous parental leave (16 weeks at 100% pay), family care leave, and unlimited sick days
  • Equity (RSUs) for all employees at all levels
  • Industry-leading health benefits that are applicable per country of residence for all our full-time employees
  • Company-paid Life Insurance
  • Home Internet stipend
  • Professional allocation for your growth and development
  • One-time allowance to assist with your home office setup
  • Company-paid access to Calm, Bravely, LinkedIn Learning, MyAcademy and Overdrive

Stack Overflow is proud to be an equal opportunity workplace. We value diversity, inclusion, equity and belonging and these pillars are at the heart of how we work together here at Stack. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.

For individuals based in California, and other locations where required, we will consider employment qualified applicants with arrest and conviction records.



  • Greater LA Area, United States BlackLine Full time

    Get to Know Us:It's fun to work in a company where people truly believe in what they're doing At BlackLine, we're committed to bringing passion and customer focus to the business of enterprise applications.Since being founded in 2001, BlackLine has become a leading provider of cloud software that automates and controls the entire financial close process....


  • Denver, Colorado, United States Guidewire Full time

    Guidewire is searching for a Sr. Site Reliability Engineer who is hungry for a rare chance to transform insurance with the industry's leading Analytics platform. As a member of the SRE-Analytics Team, you'll be responsible for building and evolving our SRE practice for Analytics. The Analytics team at Guidewire uses internet scale data collection, adaptive...


  • Greater Boston Area, United States Cohere Health Full time

    Company Overview:Cohere Health is illuminating healthcare for patients, their doctors, and all those who are important in a patient's healthcare experience, both in and out of the doctors office. Founded in August, 2019, we are obsessed with eliminating wasteful friction patients and doctors experience in areas that have nothing to do with health and...


  • Greater Boston Area, United States Cohere Health Full time

    Company Overview:Cohere Health is illuminating healthcare for patients, their doctors, and all those who are important in a patient's healthcare experience, both in and out of the doctors office. Founded in August, 2019, we are obsessed with eliminating wasteful friction patients and doctors experience in areas that have nothing to do with health and...


  • Denver, Colorado, United States Fubo Full time

    About Fubo:With a mission to build the world's leading global live TV streaming platform with the greatest breadth of premium content and interactivity, FuboTV Inc (NYSE:FUBO) aims to transcend the industry's current TV model. Fubo operates in the U.S., Canada, France and Spain. The company also has a growing technology center in Bangalore, India that is...


  • Denver, Colorado, United States Fubo Full time

    About Fubo: With a mission to build the world's leading global live TV streaming platform with the greatest breadth of premium content and interactivity, FuboTV Inc. (NYSE: FUBO) aims to transcend the industry's current TV model. Fubo operates in the U.S., Canada, France and Spain. The company also has a growing technology center in Bangalore, India that is...


  • Denver, United States Remotely Full time

    This is a remote position. Site Reliability Engineer (1 year experience, remote) Be part of our future! This job posting builds our talent pool for potential future openings. We'll compare your skills and experience against both current and future needs. If there's a match, we'll contact you directly. No guarantee of immediate placement, and we only consider...


  • Denver, United States Remotely Full time

    This is a remote position. Site Reliability Engineer (1 year experience, remote) Be part of our future! This job posting builds our talent pool for potential future openings. We'll compare your skills and experience against both current and future needs. If there's a match, we'll contact you directly. No guarantee of immediate placement, and we only consider...


  • Denver, Colorado, United States Entrust Full time

    Career Growth, Flexibility and Collaboration Entrust is dedicated to keeping the world moving safely by enabling trusted identities, payments, and data protection around the globe. Headquartered in Minnesota, we offer our colleagues the ability to work globally, in a flexible and collaborative environment. Our team makes an impact The Company: Entrust...


  • Denver, United States Oracle Full time

    The role requires skills in the following areas: SRE/ DEVOPS, Cloud infrastructure Virtual Networking, Linux, CI/ CD. Additional skill sets that are appreciated are Python, Terraform, automation and knowledge of networking and services running on clo Reliability Engineer, Liability, Reliability, Engineer, Reliability, Operations, Technology

  • Senior iOS Engineer

    1 month ago


    Greater Denver Area, United States TeamSnap Full time

    About UsAt TeamSnap, we believe when the world connects through sports; the world becomes better. TeamSnap is a sports and communication platform dedicated to taking the work out of play in youth sports. We also believe our jobs should excite us, our teammates should support us and our bosses should inspire us. We empower our people to bring big ideas and...

  • Senior iOS Engineer

    4 weeks ago


    Greater Denver Area, United States TeamSnap Full time

    About UsAt TeamSnap, we believe when the world connects through sports; the world becomes better. TeamSnap is a sports and communication platform dedicated to taking the work out of play in youth sports. We also believe our jobs should excite us, our teammates should support us and our bosses should inspire us. We empower our people to bring big ideas and...


  • Greater Denver Area, United States TeamSnap Full time

    About UsAt TeamSnap, we believe when the world connects through sports; the world becomes better. TeamSnap is a sports and communication platform dedicated to taking the work out of play in youth sports. We also believe our jobs should excite us, our teammates should support us and our bosses should inspire us. We empower our people to bring big ideas and...


  • Greater Denver Area, United States TeamSnap Full time

    About UsAt TeamSnap, we believe when the world connects through sports; the world becomes better. TeamSnap is a sports and communication platform dedicated to taking the work out of play in youth sports. We also believe our jobs should excite us, our teammates should support us and our bosses should inspire us. We empower our people to bring big ideas and...


  • Denver, United States Marriott Full time

    Job Number 24059351 Job Category Information Technology Location Marriott International HQ, 7750 Wisconsin Avenue, Bethesda, Maryland, United States Schedule Full-Time Located Remotely? Y Relocation? N Position Type Management JOB SUMMARY Lead role in the Monitoring and Performance Management function at Marriott. Performs detailed performance...


  • Denver, United States Marriott Full time

    Job Number 24059351 Job Category Information Technology Location Marriott International HQ, 7750 Wisconsin Avenue, Bethesda, Maryland, United States Schedule Full-Time Located Remotely? Y Relocation? N Position Type Management JOB SUMMARY Lead role in the Monitoring and Performance Management function at Marriott. Performs detailed performance...


  • Denver, United States Tipico - North America Full time

    Job DescriptionWe are looking for someone who will be instrumental in ensuring the reliability of our organization’s Sports Betting and Casino applications:  Key duties and responsibilities include:  Approaching operations as a Software Engineering challenge applying software engineering to solve operational challenges.Being the first point of contact...


  • Denver, United States Cisco Full time

    #WeAreCisco and we're so happy you're thinking of joining us. Follow us on social @WeAreCisco to learn more about what employees say about why we love where we work, or check Cisco out on Glassdoor for the latest reviews. What You'll Do Think back on the latest significant internet outages and how they reinvented everyday life – even a few hours can halt...


  • Denver, United States Cisco Full time

    #WeAreCisco and we're so happy you're thinking of joining us. Follow us on social @WeAreCisco to learn more about what employees say about why we love where we work, or check Cisco out on Glassdoor for the latest reviews. What You'll Do Think back on the latest significant internet outages and how they reinvented everyday life – even a few hours can halt...


  • Denver, United States VIZIO Full time

    About the Team: We live and breathe big data. On a daily basis, we ingest and extract useful information from hundreds of live TV channels as well as collect, analyze and report on information from millions of TVs. Today, with over 23 million devices and operating at a massive scale leveraging modern architecture, design and technologies. As any organization...