Site Reliability Engineer

4 days ago


Chicago, United States Stardom Employment Consultants Full time

Job Description:

As a Site Reliability Engineer you will be responsible for maintaining and improving the reliability availability and performance of our systems. You will collaborate closely with development operations and security teams to build and automate scalable infrastructure monitor system health and address issues before they impact users. The ideal candidate will have a strong background in both software development and systems engineering with a passion for automation monitoring and continuous improvement.

Key Responsibilities:

  • Design implement and manage highly available and scalable infrastructure in cloud and onpremises environments.
  • Develop and maintain automation scripts and tools to streamline operations deployments and monitoring.
  • Monitor system performance and availability using monitoring tools (Prometheus Grafana Nagios etc.) and respond to incidents to minimize downtime.
  • Work closely with development teams to design and deploy reliable efficient and secure services.
  • Conduct root cause analysis of incidents and implement solutions to prevent recurrence.
  • Implement and manage CI/CD pipelines to automate code deployment and infrastructure changes.
  • Optimize system performance capacity and cost by identifying bottlenecks and areas for improvement.
  • Develop and enforce best practices for incident management disaster recovery and business continuity.
  • Participate in oncall rotations to ensure 24/7 support for critical systems and services.
  • Collaborate with security teams to ensure systems are secure and compliant with relevant standards and regulations.

Qualifications:

  • Bachelors degree in Computer Science Information Technology or a related field; relevant certifications (AWS Certified DevOps Engineer Google Professional SRE) are a plus.
  • Minimum of 35 years of experience in site reliability engineering systems engineering or a related role.
  • Strong experience with cloud platforms (AWS Azure Google Cloud etc.) and containerization technologies (Docker Kubernetes etc.).
  • Proficient in scripting and programming languages (Python Go Bash etc.) for automation and tooling.
  • Experience with configuration management tools (Ansible Puppet Chef etc.) and infrastructure as code (Terraform).
  • Solid understanding of networking security and system administration.
  • Experience with CI/CD tools and practices (Jenkins GitLab CI CircleCI etc.).
  • Excellent problemsolving skills and the ability to troubleshoot complex systems.
  • Strong communication and collaboration skills with a focus on teamwork and knowledge sharing.
  • Ability to work in a fastpaced environment and manage multiple priorities effectively.

Remote Work :

No



  • Chicago, United States Resource Logistics Full time

    Role: Site Reliability Engineer Location: Chicago, IL Hire Type: Full-time Responsibilities: Expertise with Monitoring, Alerting, Reliability Engineering & Observability Experience with Splunk, SignalFx or similar Tools Ability to create Log ingestions, Identify Metrics and KPIs App, Platform, Infra Logging & Alerting Best practices Creating Dashboards,...


  • Chicago, Illinois, United States Calabitek Full time

    Job DescriptionPosition: Site Reliability EngineerLocation: RemoteExperience: 10+ yearsThis position is responsible for ensuring application observability, maintenance, and support. The role involves identifying and implementing proactive preventive measures, evaluating, and recommending techniques, practices, or technologies that align with business...


  • Chicago, United States Definity First Full time

    We are seeking a skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. As an SRE at Definity First, you will play a crucial role in ensuring the reliability, scalability, and performance of our systems. You will collaborate with cross-functional teams to design, build, and maintain our infrastructure, and you'll have the opportunity...


  • Chicago, Illinois, United States Calabitek Full time

    Job OverviewPosition: Site Reliability EngineerLocation: Chicago, IL (Local Candidates Preferred)Experience: 10+ YearsThis position is crucial for ensuring application observability, ongoing maintenance, and robust support. The role involves identifying and implementing proactive preventive measures, as well as evaluating and recommending techniques,...


  • Chicago, Illinois, United States Oak Street Health Full time

    About Oak Street HealthOak Street Health is a leading healthcare technology company that is transforming the way healthcare is delivered to seniors. Our mission is to inspire and empower healthcare providers to deliver high-quality, patient-centered care.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team. As a Site...


  • Chicago, Illinois, United States National Black MBA Association Full time

    About the RoleThis is a strategic and transformation-focused role within the National Black MBA Association's Global Technology organization. As a Manager of Site Reliability Engineering, you will play a key part in ensuring the reliable and efficient operation of our security services.Key Responsibilities:Design and drive monitoring, alerting, and ticket...


  • Chicago, Illinois, United States National Black MBA Association Full time

    About the RoleThis is a strategic and transformation-focused role within the National Black MBA Association's Global Technology organization. As a Manager of Site Reliability Engineering, you will play a key part in ensuring the reliable and efficient operation of our security services.**Key Responsibilities:**Design and drive monitoring, alerting, and...


  • Chicago, United States Oneview Healthcare Full time

    Job DescriptionJob DescriptionSalary: Position Overview: Site Reliability Engineers support and smooth functioning of the Oneview system for our hospital customers, using their advanced technical and coding skills. People in this role will be former systems administrators or operation engineers with strong coding skills. Career development in this role...


  • Chicago, Illinois, United States Oak Street Health Full time

    Transformative Role at Oak Street HealthWe are seeking a skilled Site Reliability Engineer to collaborate with our software engineering teams in implementing monitoring and alerting solutions, designing performance tests, and automating tasks to enhance efficiency.Key ResponsibilitiesDesign and implement telemetry, monitoring, and alerting systems to ensure...


  • Chicago, Illinois, United States Circle Full time

    About CircleCircle is a pioneering financial technology company at the forefront of the emerging internet of money, where value can flow freely, globally, and instantly, revolutionizing the way we think about payments, commerce, and markets. Our cutting-edge infrastructure, including the blockchain-based USDC, empowers businesses, institutions, and...


  • Chicago, United States Cleo Full time

    Site Reliability Engineer At Cleo, we make doing business easy! Cleo is an established software company with a start-up feel. We have awesome products, which go hand in hand with our awesome culture! We are devoted to our people and pride ourselves on creating a fun, laid-back, but fast-paced work environment. Not only do we work hard, we play hard. We have...


  • Chicago, United States Saxon Global Full time

    Northern Trust Site Reliability Engineer (Azure) Location : Downtown Chicago - Onsite 2 days/week - 181 W Madison St Duration : 12+ month contract w/extension/conversion Overview The Goals Driven Wealth Management platform is a showcase product for Northern Trusts Wealth Management business and we must demonstrate our ability to deliver and...


  • Chicago, United States AmericanEagle.com Full time

    Americaneagle.com is a family-owned web design, development, and digital marketing agency with a passionate belief in the power of technology to positively transform business practices. Our focus is on helping customers grow and achieve success in the digital space. We cover a variety of different industries, including eCommerce, associations & nonprofits,...


  • Chicago, United States PDSSOFT Full time

    8 Months Contract Only Locals within an hour's drive distance Chicago, IL, US, 60602 Must have 10+ yrs of IT experience Work Model: Hybrid Anchor Days: Monday, Wednesday, Friday Hours: 8:30am - 5pm CST Job Post Title Site Reliability/DevOps Engineer Job Post Summary Seeking a Site Reliability/DevOps Engineer to gather and analyze metrics to assist in...


  • Chicago, United States PDSSOFT Full time

    8 Months Contract Only Locals within an hour's drive distance Chicago, IL, US, 60602 Must have 10+ yrs of IT experience Work Model: Hybrid Anchor Days: Monday, Wednesday, Friday Hours: 8:30am - 5pm CST Job Post Title Site Reliability/DevOps Engineer Job Post Summary Seeking a Site Reliability/DevOps Engineer to gather and analyze metrics to assist...


  • Chicago, United States PDSSOFT Full time

    8 Months Contract Only Locals within an hour's drive distance Chicago, IL, US, 60602 Must have 10+ yrs of IT experience Work Model: Hybrid Anchor Days: Monday, Wednesday, Friday Hours: 8:30am - 5pm CST Job Post Title Site Reliability/DevOps Engineer Job Post Summary Seeking a Site Reliability/DevOps Engineer to gather and analyze metrics to assist...


  • Chicago, Illinois, United States The Hartford Full time

    Senior Site Reliability EngineerAt The Hartford, we are committed to making a significant impact as an insurance provider that transcends traditional coverages and policies. Being part of our team means you have the opportunity to achieve your professional aspirations while assisting others in reaching theirs. Join us as we work towards shaping the...


  • Chicago, United States Outdefine Full time

    Site Reliability Engineer Uber Freight Software 500+ Employees Location: Chicago, Illinois, EUA About the Job Overview: Outdefine is a web3 talent community that connects top talent with leading-edge companies and enterprises globally. Companies choose to hire Outdefine Trusted Members because their skills and readiness have been proven. When you accept a...


  • Chicago, United States Cboe Full time

    Job Description Building trusted markets — powered by our people. At Cboe Global Markets, we inspire our people to solve complex challenges together because what we do matters. We provide the financial infrastructure that powers the global economy. As a leading provider of market infrastructure and tradable products, Cboe delivers cutting-edge trading,...


  • Chicago, Illinois, United States McDonald's Corporation Full time

    Job SummaryThis opportunity is part of the DevOps Center of Excellence in the Corporate Productivity Delivery office, where our mission is to help our product engineering teams deliver faster with improved quality and reliability.We work multi-functionally with our global product teams and market teams in defining and executing on our automation test...