Site Reliability Engineer

3 days ago


Chicago, Illinois, United States Stardom Employment Consultants Full time
Job Description:

As a Site Reliability Engineer at Stardom Employment Consultants, you will be responsible for maintaining and improving the reliability, availability, and performance of our systems. You will collaborate closely with development, operations, and security teams to build and automate scalable infrastructure, monitor system health, and address issues before they impact users.

Key Responsibilities:

  • Design and Implement Highly Available Infrastructure: Develop and manage highly available and scalable infrastructure in cloud and on-premises environments, ensuring seamless integration with existing systems.
  • Automation and Tooling: Develop and maintain automation scripts and tools to streamline operations, deployments, and monitoring, improving efficiency and reducing downtime.
  • System Performance and Availability: Monitor system performance and availability using monitoring tools (Prometheus, Grafana, Nagios, etc.) and respond to incidents to minimize downtime, ensuring business continuity.
  • Collaboration and Communication: Work closely with development teams to design and deploy reliable, efficient, and secure services, fostering a culture of collaboration and knowledge sharing.
  • Incident Management and Root Cause Analysis: Conduct root cause analysis of incidents and implement solutions to prevent recurrence, ensuring the highest level of system reliability.
  • CI/CD Pipelines and Automation: Implement and manage CI/CD pipelines to automate code deployment and infrastructure changes, improving efficiency and reducing manual errors.
  • System Optimization and Capacity Planning: Optimize system performance, capacity, and cost by identifying bottlenecks and areas for improvement, ensuring the best possible return on investment.
  • Best Practices and Compliance: Develop and enforce best practices for incident management, disaster recovery, and business continuity, ensuring compliance with relevant standards and regulations.
  • On-Call Rotations and Support: Participate in on-call rotations to ensure 24/7 support for critical systems and services, providing timely and effective support to minimize downtime.
  • Security and Compliance: Collaborate with security teams to ensure systems are secure and compliant with relevant standards and regulations, protecting sensitive data and ensuring business continuity.

Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or a related field; relevant certifications (AWS Certified DevOps Engineer, Google Professional SRE) are a plus.
  • Minimum of 3.5 years of experience in site reliability engineering, systems engineering, or a related role.
  • Strong experience with cloud platforms (AWS, Azure, Google Cloud, etc.) and containerization technologies (Docker, Kubernetes, etc.).
  • Proficient in scripting and programming languages (Python, Go, Bash, etc.) for automation and tooling.
  • Experience with configuration management tools (Ansible, Puppet, Chef, etc.) and infrastructure as code (Terraform).
  • Solid understanding of networking, security, and system administration.
  • Experience with CI/CD tools and practices (Jenkins, GitLab CI, CircleCI, etc.).
  • Excellent problem-solving skills and the ability to troubleshoot complex systems.
  • Strong communication and collaboration skills with a focus on teamwork and knowledge sharing.
  • Ability to work in a fast-paced environment and manage multiple priorities effectively.


  • Chicago, Illinois, United States Calabitek Full time

    Job DescriptionPosition: Site Reliability EngineerLocation: RemoteExperience: 10+ yearsThis position is responsible for ensuring application observability, maintenance, and support. The role involves identifying and implementing proactive preventive measures, evaluating, and recommending techniques, practices, or technologies that align with business...


  • Chicago, Illinois, United States Calabitek Full time

    Job OverviewPosition: Site Reliability EngineerLocation: Chicago, IL (Local Candidates Preferred)Experience: 10+ YearsThis position is crucial for ensuring application observability, ongoing maintenance, and robust support. The role involves identifying and implementing proactive preventive measures, as well as evaluating and recommending techniques,...


  • Chicago, Illinois, United States Oak Street Health Full time

    About Oak Street HealthOak Street Health is a leading healthcare technology company that is transforming the way healthcare is delivered to seniors. Our mission is to inspire and empower healthcare providers to deliver high-quality, patient-centered care.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team. As a Site...


  • Chicago, Illinois, United States National Black MBA Association Full time

    About the RoleThis is a strategic and transformation-focused role within the National Black MBA Association's Global Technology organization. As a Manager of Site Reliability Engineering, you will play a key part in ensuring the reliable and efficient operation of our security services.Key Responsibilities:Design and drive monitoring, alerting, and ticket...


  • Chicago, Illinois, United States National Black MBA Association Full time

    About the RoleThis is a strategic and transformation-focused role within the National Black MBA Association's Global Technology organization. As a Manager of Site Reliability Engineering, you will play a key part in ensuring the reliable and efficient operation of our security services.**Key Responsibilities:**Design and drive monitoring, alerting, and...


  • Chicago, Illinois, United States Oak Street Health Full time

    Transformative Role at Oak Street HealthWe are seeking a skilled Site Reliability Engineer to collaborate with our software engineering teams in implementing monitoring and alerting solutions, designing performance tests, and automating tasks to enhance efficiency.Key ResponsibilitiesDesign and implement telemetry, monitoring, and alerting systems to ensure...


  • Chicago, Illinois, United States Circle Full time

    About CircleCircle is a pioneering financial technology company at the forefront of the emerging internet of money, where value can flow freely, globally, and instantly, revolutionizing the way we think about payments, commerce, and markets. Our cutting-edge infrastructure, including the blockchain-based USDC, empowers businesses, institutions, and...


  • Chicago, Illinois, United States The Hartford Full time

    Senior Site Reliability EngineerAt The Hartford, we are committed to making a significant impact as an insurance provider that transcends traditional coverages and policies. Being part of our team means you have the opportunity to achieve your professional aspirations while assisting others in reaching theirs. Join us as we work towards shaping the...


  • Chicago, Illinois, United States Gusto Full time

    About GustoGusto is a modern, online people platform that helps small businesses take care of their teams. On top of full-service payroll, Gusto offers health insurance, 401(k)s, expert HR, and team management tools. Today, Gusto offices in Denver, San Francisco, and New York serve more than 300,000 businesses nationwide. Our mission is to create a world...


  • Chicago, Illinois, United States Donato Technologies, Inc Full time

    Job OverviewPosition Title: DevOps EngineerCompany: Donato Technologies, IncWork Model: HybridOnsite Days: Tuesday - ThursdayContract Duration: 6 MonthsPosition SummaryWe are in search of a skilled DevOps Engineer to partner with our Application Development teams in delivering innovative business solutions through agile methodologies while effectively...


  • Chicago, Illinois, United States DASH2 Full time

    OverviewDASH2 is seeking experienced technical professionals who are eager to excel in delivering top-tier SaaS solutions. We provide a stimulating environment that encourages growth, adaptability, and the consistent application of your skills. Our clients depend on us during critical moments, and our engineering team is committed to fulfilling that...


  • Chicago, Illinois, United States TEKsystems Full time

    Position Overview:This Site Reliability Engineering (SRE) team is responsible for facilitating in-depth advisory sessions, establishing SRE program leadership internally, and recruiting and nurturing talent for client projects.The Practice Architect will be strategic, generating innovative SRE methodologies in areas such as observability, production...


  • Chicago, Illinois, United States DASH2 Full time

    OverviewDASH2 is seeking skilled technical professionals at various levels who are eager to challenge themselves in delivering top-tier SaaS solutions. We provide a stimulating environment that encourages growth, adaptability, and the consistent application of your expertise. Our clients depend on us during critical moments, and our engineering team is...


  • Chicago, Illinois, United States Itron, Inc. Full time

    Itron is revolutionizing how utilities and cities manage energy and water. We are committed to creating a more sustainable, resourceful world. Join us.Job Family SummaryPlans, designs, develops and tests software systems or applications for software enhancements and new products including cloud-based or internet-related tools. Evaluates reliability of...


  • Chicago, Illinois, United States Jobot Full time

    Remote Azure Site Reliability Engineer OpportunityThis position is hosted by Jobot Consulting.About Us:We are a dynamic tech consulting firm seeking a Senior Cloud Site Reliability Engineer with a strong background in Azure Cloud. In this role, you will play a key part in implementing Site Reliability Engineering (SRE) practices across our enterprise...


  • Chicago, Illinois, United States Jobot Full time

    Remote Azure Site Reliability Engineer Opportunity with a Leading Tech Consulting FirmAbout Us:We are a dynamic consulting organization seeking a seasoned Cloud Site Reliability Engineer with a strong foundation in Azure Cloud technologies. This fully remote position is pivotal in implementing Site Reliability Engineering (SRE) methodologies across our...

  • Reliability Engineer

    1 month ago


    Chicago, Illinois, United States GATX Full time

    OverviewFounded in 1898 and headquartered in Chicago, IL, GATX Corporation (NYSE: GATX) is an industry leader with 125+ years of success-success that is powered by our people. We are proud of our high-performance culture, hard-working and enthusiastic management team, and beautiful office space in the Willis Tower.At GATX, we hire the best and offer our...


  • Chicago, Illinois, United States Adyen Full time

    About AdyenAdyen is a leading financial technology platform that provides payments, data, and financial products to businesses. Our mission is to empower companies to achieve their ambitions by delivering innovative and ethical solutions.Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer, Internal Services to join our team. As a key...


  • Chicago, Illinois, United States The Hartford Full time

    About The HartfordThe Hartford is a leading insurance company that goes beyond traditional coverages and policies. We're committed to making a difference and proud to be an organization that values innovation and excellence.Job SummaryWe're seeking a highly skilled Staff Reliability Engineer to join our Reliability Engineering Team. As a key member of our...


  • Chicago, Illinois, United States CCC Intelligent Solutions, Inc. Full time

    About the RoleCareer Opportunities at CCC Intelligent Solutions, Inc.We are seeking a highly skilled Senior Cloud Reliability Engineer to join our team at CCC Intelligent Solutions, Inc. As a key member of our engineering team, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based applications and services...