Site Reliability Engineer

1 month ago


Chicago, United States Prohires Full time

Job Title: Site Reliability Engineer

Location- Onsite 3 days a week in Riverwoods, IL- Chicago

Expert Application Engineer (SRE)

Job Description:

As an Application Reliability Engineer, you ll tap into your passion for finding and fixing inefficiencies to solve our reliability and performance issues. In our Agile environment, you ll focus on availability, latency, performance, efficiency, change and problem management, monitoring, emergency response and capacity planning of our services. Your projects will deliver enhanced infrastructure, development, and deployment automation.

At a minimum, here s what we need: 8+ Years Information Technology, (Software) Engineering, or related

Responsibilities:

  • Analyse, design, program, test, and deploy new user stories and features with high quality (security, reliability, operations) to production
  • Achieves team commitments (and influence others to do the same) by using informal leadership & highly developed communication skills
  • Has an oversight on design decisions and guides team to achieve key results for products assigned to them
  • Remediates issues using engineering principles and creates proactive design solutions for potential failures
  • Work with a team of site reliability engineers that is responsible for building the continuous reliability mindset, shepherding problem management, and driving key site reliability engineering practices into the organization.
  • Design and drive monitoring, alerting, ticket reporting strategies to measure SLA, SLO, MTTI, MTTR. Etc. and align with management expectations to reduce/minimize prod downtime.
  • Guide site reliability automation to help eliminate manual toil and create a self-healing capability
  • Participate in selection of appropriate automation tools, defining technology, quality, experience and implementation standards and practices within own technical domain.
  • Fosters a culture of excellence and continuous learning within the chapter. Establishes and tracks to appropriate OKRs to ensure outcomes are met.
  • Creates solutions addressing high impact technology and business priorities
  • Competent in multiple contexts, such as programming languages, security, automation, testing, infrastructure, and performance and is the go-to person for many people (inside and outside of their team)
  • Proactively identifies and mitigates issues based on intuition and experience in multiple domains

Must Have Skills:

  • Experienced with AWS Cloud
  • Experienced in building and managing OCP clusters, deploy applications into OCP
  • Experience with SRE design to address reliability and resiliency with availability of 5-9s
  • Experience in managing caching solutions like Hazelcast, GemFire or Terracota
  • Experience in setting up and managing Kafka
  • High level of familiarity with the Linux command line and scripting
  • Extremely comfortable with production environments, firewalls, and networking
  • Strong experience in deploying, observing, altering, logging, and monitoring systems (Splunk, Datadog, AppDynamics, Instana) with a mindset towards predictive analysis.
  • Working knowledge of the automation tools such as Ansible, Terraform, or Chef
  • Experience in performing RCA, Disaster Recovery activities, Chaos Engineering

Good to have Skills:

  • Highly preferred experience working in the payments industry
  • Deep knowledge and understanding of emerging trends in the SRE field.
  • Experience developing in Java (or other similar languages)
  • Studied architectural patterns at scale, including thoughtfully designed APIs, repeatable delivery pipelines, and efficient computer engineering principles.
  • Working knowledge of messaging services like RabbitMQ, SQS, Kafka
  • Strong Experience with Continuous Integration and Continuous Delivery models including Blue/Green and/or Canary release models

Tools & Technologies:

  • Open-shift Container Platform
  • (Splunk, Datadog, AppDynamics, Instana)
  • HazelCast.
  • Ansible, Terraform, or Chef
  • RabbitMQ, SQS, Kafka
  • Linux VMs , Shell Scripting
  • AWS CLoud
  • Postgress Database



  • Chicago, Illinois, United States Brain Bolt Consulting Full time

    Job Title: Site Reliability EngineerAt Brain Bolt Consulting, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and reliability of our systems and applications.Key Responsibilities:Design, develop, and deploy scalable and reliable systems...


  • Chicago, Illinois, United States Saxon Global Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Saxon Global. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and performance of our cloud-based systems.Key Responsibilities:Design, implement, and maintain scalable and highly available cloud...


  • Chicago, United States Brain Bolt Consulting Full time

    Responsibilities:Analyse, design, program, test, and deploy new user stories and features with high quality (security, reliability, operations) to productionAchieves team commitments (and influence others to do the same) by using informal leadership & highly developed communication skillsHas an oversight on design decisions and guides team to achieve key...


  • Chicago, Illinois, United States Diverse Lynx Full time

    Job Summary: We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our cloud-based applications. Key Responsibilities:Design and implement monitoring, metrics, and logging systems to ensure application...


  • Chicago, Illinois, United States OnDeck Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in maintaining the reliability and efficiency of our consumer business from a technology and operational standpoint.You will collaborate with IT, Software Engineering, and product teams to resolve...


  • Chicago, Illinois, United States WEX Full time

    Join WEX's SRE TeamWe are seeking a talented Site Reliability Engineer to join our team at WEX. As a key member of our SRE team, you will be responsible for ensuring the reliability and performance of our software systems.Key Responsibilities:Design and implement solutions to improve system reliability and performanceCollaborate with cross-functional teams...


  • Chicago, Illinois, United States Oak Street Health Full time

    Role OverviewWe are seeking a skilled Site Reliability Engineer to join our team at Oak Street Health. As a Site Reliability Engineer, you will play a critical role in ensuring the stability and performance of our platform, which is built specifically for the clinical team. You will partner with our software engineering teams to transform ideas into reality,...


  • chicago, United States Matlen Silver Full time

    Compensation: $70 - $75/HourHybrid: 2 Days Onsite Chicago IllinoisDomain: Retail/Supply ChainJob Title: Site Reliability EngineerPosition SummaryAs a Site Reliability Engineer/DevOps Engineer, you will be responsible for ensuring the availability, performance, and reliability of Fulfillment Technology solutions for our client to support omni-channel...


  • Chicago, United States Matlen Silver Full time

    Compensation: $70 - $75/HourHybrid: 2 Days Onsite Chicago IllinoisDomain: Retail/Supply ChainJob Title: Site Reliability EngineerPosition SummaryAs a Site Reliability Engineer/DevOps Engineer, you will be responsible for ensuring the availability, performance, and reliability of Fulfillment Technology solutions for our client to support omni-channel...


  • chicago, United States Matlen Silver Full time

    Compensation: $70 - $75/HourHybrid: 2 Days Onsite Chicago IllinoisDomain: Retail/Supply ChainJob Title: Site Reliability EngineerPosition SummaryAs a Site Reliability Engineer/DevOps Engineer, you will be responsible for ensuring the availability, performance, and reliability of Fulfillment Technology solutions for our client to support omni-channel...


  • Chicago, United States Algo Capital Group Full time

    Linux Site Reliability Engineer – Linux Systems Engineering TeamOur client, an industry leading proprietary trading firm and liquidity provider, is looking for a Linux Site Reliability Engineer to join their expanding Linux Systems Engineering Team in Chicago. The firm prides itself on its collaborative environment and usage of mostly in-home tools and...


  • chicago, United States Algo Capital Group Full time

    Linux Site Reliability Engineer – Linux Systems Engineering TeamOur client, an industry leading proprietary trading firm and liquidity provider, is looking for a Linux Site Reliability Engineer to join their expanding Linux Systems Engineering Team in Chicago. The firm prides itself on its collaborative environment and usage of mostly in-home tools and...


  • Chicago, United States Algo Capital Group Full time

    Linux Site Reliability Engineer – Linux Systems Engineering TeamOur client, an industry leading proprietary trading firm and liquidity provider, is looking for a Linux Site Reliability Engineer to join their expanding Linux Systems Engineering Team in Chicago. The firm prides itself on its collaborative environment and usage of mostly in-home tools and...


  • Chicago, Illinois, United States Enova Full time

    About the Role: As a Site Reliability Engineer at Enova, you will play a crucial part in maintaining the reliability of our consumer business from a technology and operational standpoint. You will drive the rapid improvement and efficiency of our platform by implementing automated tools, evaluating processes, troubleshooting, and resolving complex problems....


  • Chicago, Illinois, United States Diverse Lynx Full time

    Job DescriptionThis role will be responsible for ensuring the reliability and availability of our applications, identifying and implementing preventive measures to minimize downtime, and collaborating with cross-functional teams to implement business solutions through agile practices.As a Site Reliability Engineer Associate, you will work closely with the...


  • Chicago, Illinois, United States Matlen Silver Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at Matlen Silver. As a key member of our infrastructure and operations team, you will be responsible for ensuring the availability, performance, and reliability of our Fulfillment Technology solutions.Key Responsibilities:Partner with application engineering, observability,...


  • Chicago, Illinois, United States WEX Full time

    About the RoleWe are seeking a highly motivated and quick-learning Site Reliability Engineer to join our team at WEX. As a key member of our Benefits Reliability organization, you will play a critical role in ensuring the reliability, performance, and security of our systems.Key ResponsibilitiesActively participate in training and mentorship programs to gain...


  • Chicago, Illinois, United States CloudBC Labs Full time

    Job Title: Site Reliability EngineerJob Summary:CloudBC Labs is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure. You will work closely with our development team to identify and resolve issues, and...


  • Chicago, United States Selby Jennings Full time

    A leading Proprietary Trading firm is seeking a Site Reliability Engineer to join their team. You'll design and support the systems used by electronic trading desks leveraging tools like Linux, Kubernetes, and Python. What you'll do: Support software development teams to implement different parts of the application life cycle, i.e. application deployment,...


  • Chicago, Illinois, United States Oak Street Health Full time

    Transform Healthcare with UsAs a Site Reliability Engineer at Oak Street Health, you will play a pivotal role in ensuring the stability and performance of our innovative healthcare platform. You will collaborate with our agile software engineering teams to design, implement, and maintain scalable systems that directly impact the experience of our teams and...