Site Reliability Engineer

6 days ago


Redwood City, United States 1872 Consulting Full time

Site Reliability Engineer - 100% Remote

Role Summary:

Site Reliability Engineers (SREs) are responsible for working with different developer teams to keep our systems running smoothly. They are a blend of pragmatic operators and software craftspeople that apply excellent problem-solving and communication skills to develop or configure tools that will automate, monitor, and alert the reliability of internal Systems


What you will be doing:

  • Be on-call rotation to respond to LeadIQ availability incidents and support developers with customer incidents
  • Use your on-call shift to prevent incidents from happening. Step-in either actively or in support of the engineers when they do.
  • Run our infrastructure with AWS, Terraform, and Kubernetes (EKS).
  • Think about systems - edge cases, failure modes, behaviors, specific implementations.
  • Make monitoring and alert on symptoms and not on outages.
  • Document every action, so your findings turn into repeatable actions–and then into automation.
  • Improve the deployment process to make it as boring as possible.
  • Design, build and maintain core infrastructure pieces that allow LeadIQ scaling to support hundreds of thousands of concurrent users.
  • Debug production issues across services and levels of the stack.
  • Plan the growth of LeadIQ infrastructure.
  • Support the definition and building of SLI and SLO for engineering teams

The Requirements:

  • 4+ years working with Terraform and AWS
  • 2+ years working with-
    • Gitlab (or similar) as CI tool
    • Datadog (or similar) as Alerting tool
    • Kubernetes
  • Know your way around Linux and the Unix Shell.
  • Programming skills on NodeJS and/or Go

Nice to Haves

  • Have experience with tech stack: Nginx, Docker, Kubernetes, Terraform, Terragrunt, AWS, Gitlab, Helm, ArgoCD, Datadog, or similar technologies
  • AWS, Terraform, Kubernetes certifications
#J-18808-Ljbffr

  • Oklahoma City, Oklahoma, United States Ford Motor Company Full time

    Site Reliability Engineering at Ford Motor Company plays a critical role in maintaining and improving the reliability, scalability, and performance of our services. You will work closely with our development teams to build and maintain large-scale, distributed systems and ensure our products meet our high standards for availability and user...


  • Oklahoma City, United States PAYCOM PAYROLL LLC Full time

    Site reliability engineers will be dedicated full-time to creating software tools, metrics and processes that improve the reliability of applications, sites, and systems in production. The Site Reliability Engineer is primarily responsible for ensuring the integrity, functionality, and reliability of applications and sites.RESPONSIBILITIESDevelop software to...


  • Oklahoma City, United States Paycom Payroll Llc Full time

    Site reliability engineers will be dedicated full-time to creating software tools, metrics and processes that improve the reliability of applications, sites, and systems in production. The Site Reliability Engineer is primarily responsible for ensuring the integrity, functionality, and reliability of applications and sites.RESPONSIBILITIESDevelop software to...


  • Oklahoma City, United States Paycom Online Full time

    Site reliability engineers will be dedicated full-time to creating software tools, metrics and processes that improve the reliability of applications, sites, and systems in production. The Site Reliability Engineer is primarily responsible for ensuring the integrity, functionality, and reliability of applications and sites. RESPONSIBILITIES Develop...


  • Jersey City, New Jersey, United States JPMorganChase Full time

    Job Description Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.As a Lead Site Reliability Engineer at JPMorgan Chase within the Community & Consumer Banking - Infrastructure & Production Management Team, you hold a leadership role...


  • Oklahoma City, Oklahoma, United States Thegradcafe Full time

    Position Overview:This is a full-time role for a Senior Site Reliability Engineer with a software development organization specializing in manufacturing and mechanical engineering. Opportunity:Join a distributed team dedicated to enhancing manufacturing processes and reducing production costs for physical products. Work Environment:This position is hybrid,...


  • Culver City, United States V-Soft Consulting Group, Inc. Full time

    Role: Site Reliability Engineer (Data Center)Number of positions: 2Location: 5 days’ on-site in one of these 3 locationsCulver City, CA 90230Mountain View, CA 94041Bellevue, WA 98004 The Ideal Candidate will have experience with system operations and running large-scale, massively distributed infrastructure. Responsibilities:Data monitoring and alerting,...


  • Culver City, United States V-Soft Consulting Group, Inc. Full time

    Role: Site Reliability Engineer (Data Center)Number of positions: 2Location: 5 days’ on-site in one of these 3 locationsCulver City, CA 90230Mountain View, CA 94041Bellevue, WA 98004 The Ideal Candidate will have experience with system operations and running large-scale, massively distributed infrastructure. Responsibilities:Data monitoring and alerting,...


  • Oklahoma City, Oklahoma, United States Zoom Full time

    Site Reliability Engineer - WorkvivoWhat you can expectAs a Site Reliability Engineer, you will run the production environment by monitoring availability and taking a holistic view of system health. You will build software and systems to manage platform infrastructure and applications. Your work will help improve reliability, quality, and time-to-market of...


  • Jersey City, New Jersey, United States The Goldman Sachs Group Full time

    About the RoleAt The Goldman Sachs Group, we're seeking a highly skilled Site Reliability Engineering Specialist to join our Platforms team. As a key member of our global engineering team, you'll be responsible for designing, developing, and operating distributed systems that provide observability for our mission-critical applications and platform...


  • Foster City, United States Zoox Full time

    Zoox is looking for a site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous vehicles. In this role, you will be heavily involved in all phases of rolling out a service from designing systems that are easy to maintain and fault-tolerant through...


  • Foster City, United States Zoox Full time

    Zoox is looking for a site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous vehicles. In this role, you will be heavily involved in all phases of rolling out a service from designing systems that are easy to maintain and fault-tolerant through...


  • Foster City, United States Zoox Full time

    Zoox is looking for a site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous vehicles. In this role, you will be heavily involved in all phases of rolling out a service from designing systems that are easy to maintain and fault-tolerant through...


  • Jersey City, New Jersey, United States JPMorganChase Full time

    Job Description Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership.As a Senior Manager of Site Reliability Engineering at JPMorgan Chase within the Corporate Technology, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team's...


  • Jersey City, New Jersey, United States Devexperts Full time

    Company DescriptionDevexperts has been working for nearly two decades consulting and developing for the financial industry. We solve complex technological challenges facing the most well-respected financial institutions worldwide.By becoming a part of Devexperts, you'll become a part of a company that fosters self-improvement and actively seeks...


  • Oklahoma City, United States Allied Reliability Full time

    Overview: The primary focus of this role is improving the productivity and efficiency of our chemical manufacturing processes through developments of existing and to be developed control systems. You will be accountable for developing and implementing carefully designed and engineered solutions to plant operations control for improved efficiency and uptime...


  • Jersey City, United States Fidelity Investments Full time

    Job Description:The RoleAs a member of the TechOps SRE team, you'll work closely with our engineering partners to help enable and drive initiatives from design to implementation. Our highly available multi-region Kubernetes (AWS EKS) environments are best-in-class and central to our enterprise-grade infrastructure strategy. These growing environments...


  • Redwood City, California, United States BetterOmics Full time

    About BetteromicsBetteromics is a cutting-edge Data Engineering Platform for Life Sciences, revolutionizing the way life science product development is accelerated with industrial strength software platforms.We are a team of mission-driven professionals, with combined experiences from academia, science, and technology companies. Our team is passionate about...


  • Jersey City, United States Bank of America Full time

    Job Description: At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day. One of the keys to driving Responsible Growth is being a great place to work for...


  • Jersey City, United States Veterans Sourcing Group LLC Full time

    Site Reliability Engineer (AWS) (SRE) Jersey City, NJ - onsite 3 days/week 12 month minimum contract w/ possible full-time conversion Roles And Responsibilities Design, code, test, and deliver software to automate manual operational work. Troubleshoot priority incidents, facilitate blameless post-mortems, and ensure permanent closure of incidents. Engage...