Senior Site Reliability Engineer

3 weeks ago


Denver, United States Guidewire Full time

Guidewire is searching for a Sr. Site Reliability Engineer who is hungry for a rare chance to transform insurance with the industry's leading Analytics platform. As a member of the SRE-Analytics Team, you'll be responsible for building and evolving our SRE practice for Analytics. The Analytics team at Guidewire uses internet scale data collection, adaptive machine learning, generative automated intelligence (Gen AI), and insurance risk modeling capabilities to help insurers and other financial institutions model evolving risks, develop new products, and make better business decisions. This role is a great opportunity for individuals motivated by learning cutting edge technologies and their application to solve real world business problems. Guidewire is the AWS for insurance companies that use our platforms and applications. The solutions developed by you and this team will be used by hundreds of insurance companies and impact billions of dollars in annual transactions

Downtime and failures are inevitable, but how SREs deal with the problem is what's important. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments. Part of the responsibility SREs have is to collaborate with developers to troubleshoot and solve problems and reduce customer impact where possible. SREs will also need to go one step further after the incident to document and examine what went wrong and develop measures such as automated runbooks to handle the issue moving forward.

When on-call, you will be responsible for:

  • Responding to any critical incidents and ticket escalations.
  • Following and documenting our post incident response/post mortem processes.
  • Executing planned patching or improving related automation Engineering to reduce toil, tune alerts, and improve documentation
When NOT on-call, you will be responsible for:
  • Engineering to re-platform or migrate layers of our infrastructure to Kubernetes ecosystems.
  • Analyzing our AWS infrastructure and related applications/services for design and architectural opportunities to improve overall reliability and cost intelligence.
  • Creating patterns of observability to ensure all alerts have consistent content/config to ensure triaging is short and overall MTTR is continuously improved.
  • Analyzing incident data to determine the next opportunity to improve reliability.
  • Influencing engineers to improve application reliability and scalability to run efficiently.
  • Documenting every action, if not captured as code, so your findings turn into repeatable actions and then into automation.
  • Improve operational processes (such as deployments and upgrades) to make them as boring as possible
Required Skills:
  • Proven experience triaging and debugging distributed systems on cloud infrastructure Proven experience in designing and engineering CI/CD pipelines within K8S and legacy ecosystems.
  • Experience in building, deploying, and running scalable infrastructure within AWS and Kubernetes ecosystems using Terraform and other cloud native approaches.
  • Experience in designing and engineering monitors, dashboards, and synthetic testing.
  • Experience in managing infrastructure config at scale using multiple approaches and/or tools such as GitOps, Puppet, or Ansible.
  • Good understanding of AWS cloud networking and security with hands-on experience remediating infrastructure vulnerabilities at scale.
  • Comfortable with Linux system administration, with the ability to program/script using Python, Go, Java, shell, or equivalent.
  • Good verbal and written communication skills
Preferred Skills
  • SRE Certified in multiple categories.
  • AWS Certified in multiple categories.
  • Experience with Datadog Cloud Monitoring.
  • Proficiency with SQL, database administration, data pipelines, performance tuning, and schema design.
  • Proficiency with multiple pipelining tools such as Team City, Bitbucket Pipelines, Jenkins, and GitHub Actions.
  • Familiarity with open-source distributed data processing frameworks such as Hadoop, Apache Spark, AWS RedShift, etc


  • Denver, United States Mars Full time

    Say hello to possibilities. It’s not everyday that you consider starting a new career. We’re RingCentral, and we’re happy that someone as talented as you is considering this role. First, a little about us, we’re the $2 billion global leader in cloud-based communications and collaboration software. We are fundamentally changing the nature of human...


  • Denver, United States Remotely Full time

    This is a remote position. Site Reliability Engineer (1 year experience, remote) Be part of our future! This job posting builds our talent pool for potential future openings. We'll compare your skills and experience against both current and future needs. If there's a match, we'll contact you directly. No guarantee of immediate placement, and we only consider...


  • Denver, United States VIZIO Full time

    About the Team: We live and breathe big data. On a daily basis, we ingest and extract useful information from hundreds of live TV channels as well as collect, analyze and report on information from millions of TVs. Today, with over 23 million devices and operating at a massive scale leveraging modern architecture, design and technologies. As any organization...


  • Denver, United States Cisco Full time

    #WeAreCisco and we're so happy you're thinking of joining us. Follow us on social @WeAreCisco to learn more about what employees say about why we love where we work, or check Cisco out on Glassdoor for the latest reviews. What You'll Do Think back on the latest significant internet outages and how they reinvented everyday life – even a few hours can halt...


  • Denver, United States VIZIO Full time

    About the Team: We live and breathe big data. On a daily basis, we ingest and extract useful information from hundreds of live TV channels as well as collect, analyze and report on information from millions of TVs. Today, with over 23 million devices and operating at a massive scale leveraging modern architecture, design and technologies. As any organization...


  • Greater Denver Area, United States Stack Overflow Full time

    Every developer has a tab open on Stack Overflow. We are one of the most popular websites in the world - a community-based space focused on increasing productivity, decreasing cycle times, accelerating time to market, and protecting institutional knowledge. Innovation is at the heart of everything we do. We embrace collaboration, transparency, and believe in...


  • Denver, United States Diverse Lynx Full time

    Job Title: DevOps SRE/ Site Reliability Engineer Location: Denver, CO (Onsite) Type : Full Time Employment Job Description: The Staff Systems Engineer in the Platform-as-a-Service (PaaS) group will be responsible for supporting various technologies and tools to optimize Visa's Developer Tools/Services. Key responsibilities include Provide user support and...


  • Denver, United States The AES Corporation Full time

    Are you ready to be part of a company that's not just talking about the future, but actively shaping it? Join The AES Corporation (NYSE: AES), a Fortune 500 company that's leading the charge in the global energy revolution. With operations spanning 14 countries , AES is committed to shaping a future through innovation and collaboration. Our dedication to...


  • Denver, United States Diverse Lynx Full time

    Job Title: DevOps SRE/ Site Reliability Engineer Location: Denver, CO (Onsite) Type : Full Time EmploymentJob Description: The Staff Systems Engineer in the Platform-as-a-Service (PaaS) group will be responsible for supporting various technologies and tools to optimize Visa's Developer Tools/Services. Key responsibilities include Provide user support and...


  • Denver, United States Fruition Full time

    Fruition is a leader in software development with a focus on delivering high-quality web solutions for clients across various sectors. Our projects involve a mix of content management systems, including Drupal, WordPress, and custom Python and Next.js applications. We are currently seeking an experienced SRE contractor whose first set of tasks is improving...


  • Denver, United States Plume Ltd Full time

    Job Description Job Description Life at Plume At Plume, we believe that technology isn't about moving faster, it's about making life's moments better. Which is why we've built the world's first, and only, open and hardware-independent service delivery platform for smart homes, small businesses, enterprises, and beyond. Our SaaS platform uses WiFi, advanced...


  • Denver, United States Plume Full time

    Job DescriptionJob DescriptionLife at PlumeAt Plume, we believe that technology isn't about moving faster, it's about making life's moments better. Which is why we've built the world's first, and only, open and hardware-independent service delivery platform for smart homes, small businesses, enterprises, and beyond. Our SaaS platform uses...


  • Denver, United States Fruition Full time

    Fruition is a leader in software development with a focus on delivering high-quality web solutions for clients across various sectors. Our projects involve a mix of content management systems, including Drupal, WordPress, and custom Python and Next.js applications. We are currently seeking an experienced SRE contractor whose first set of tasks is improving...


  • Denver, United States NICE Full time

    Senior DevOps Engineer / Site Reliability Engineer (SRE) About the team The CXone Expert product is a multi-tenant SaaS platform, designed to handle millions of requests with high performance and reliability. Each Expert site can easily host a complex hierarchy of tens of thousands of pages (articles), with layers of fine-grained permissioning, server- and...


  • Denver, United States Jones Lange Lasalle, Inc. Full time

    The Junior Reliability Engineer is responsible for performing data validation around assets (HVAC, Electrical, Plumbing, etc.) that are managed by both Mobile and Static Facilities Management Technicians at all managed facilities within our West Caro Reliability Engineer, Liability, Reliability, Engineer, Reliability, Junior, Manufacturing, Property...


  • Denver, United States Ping Identity Full time

    At Ping Identity, we're changing the way people think about enterprise security technology. With our new Identity Defined Security platform, we're building a borderless world where people have total freedom to work wherever and however they want. Without friction. Without fear. We call this digital freedom. And it's not just something we provide our...


  • Denver, United States Ping Identity Full time

    At Ping Identity, we're changing the way people think about enterprise security technology. With our new Identity Defined Security platform, we're building a borderless world where people have total freedom to work wherever and however they want. Without friction. Without fear. We call this digital freedom. And it's not just something we provide our...

  • Senior Engineer

    4 days ago


    Denver, United States HBK Engineering Full time

    HBK Engineering is a seeking Licensed Professional Civil Site Engineer to support our growing portfolio of land development projects, including electric vehicle charging stations, commercial and utility-scale solar, battery energy storage sites and utility-related civil site work. HBK is transforming essential infrastructure to achieve a sustainable future...


  • Denver, United States VIZIO Full time

    About the Team: VIZIO is looking for a Senior Staff Engineer (Database Reliability) for VIZIOs Software Engineering team. The successful candidate will play an important role in driving a critical overhaul in how we deploy and manage our database platforms, enabling a radical improvement in consistency, efficiency, and operability. What You Will Do: Lead and...

  • Senior Engineer

    4 weeks ago


    Denver, United States HBK Engineering, LLC Full time

    Job DescriptionJob DescriptionHBK Engineering is a seeking Licensed Professional Civil Site Engineer to support our growing portfolio of land development projects, including electric vehicle charging stations, commercial and utility-scale solar, battery energy storage sites and utility-related civil site work.  HBK is transforming essential infrastructure...