See more Collapse

Site Reliability Engineer

2 months ago


Charlotte, United States Sumitomo Mitsui Banking Corp Full time
JOB SUMMARY:

You will work closely with our software engineering and data teams to implement and maintain robust data pipelines and infrastructure. Your expertise in Google Cloud Platform (GCP) or Azure, container technologies like Kubernetes, or Docker, and Apache Airflow processes will be crucial in driving our success.

PRINCIPAL DUTIES & RESPONSIBILITIES:
  • Troubleshoot and resolve issues in live production environments and implement strategies to remediate them with minimal effort.
  • Manage applications through automation.
  • Support and monitor new and existing services, platforms, and application stacks.
  • Engage in improving the lifecycle of services deployment, operations, and refinement.
  • Provide technical expertise during service impacting events.
  • Collaborate with other engineers on code reviews, internal infrastructure improvements and process enhancements.
  • Use scalability testing to measure, tune and optimize system performance.
  • Participate in periodic 24x7 on-call duties.
  • Being accountable for resolving the outage via workaround or permanent fix
  • Ensuring all administration and reports are maintained and up to date including contacts information technical diagrams post major incident reviews.
  • Responsible for communicating with various stake holders.
  • Responsible for the effective implementation of the process Incident, Change and Problem Management and conducts the respective reporting procedure.
  • Monitor the incidents to ensure that the Service Level Agreement is respected.
  • Identify, initiate, and conduct incident triage.
  • Ensure the closure of all resolved and end-user confirmed Incident records.
  • Establish continuous process improvement cycles where the process performance activities roles and responsibilities policies procedures and supporting technology is reviewed and enhanced where applicable.
  • Knowledge on application and data monitoring fundamentals (Splunk, Open Telemetry, Dynatrace, Airflow DAG)
  • Knowledge of log parsing, complex Splunk searches, including external table lookups, Splunk data flow, components, features, and product capability.
  • Capability to setup alerts and from the machine generated data.
REQUIREMENTS:
  • Education: Bachelor's Degree or Equivalent.
  • 5+ years of experience in Software Engineering.
  • 3+ years of experience in Site Reliability.
  • Experience with one or more Cloud Platforms (GCP, Azure, AWS).
  • Experience working with a data workflow management platform such as Apache Airflow.
  • Experience with Container technologies: Kubernetes, Docker, PKS.
  • Experience setting up monitoring applications and database.
  • Experience in third party services and third-party vendor management.
  • Excellent verbal, written, and interpersonal communication skills.
  • Experience in ServiceNow preferred.
  • Experience working with financial data is preferred (Metro2, 2052a, etc).

We have other current jobs related to this field that you can find below


  • Charlotte, United States BrightSpeed Full time

    Job Description We are currently looking for a Principal Site Reliability Engineer to join our growing team. In this role, you will implement and maintain monitoring systems to track the performance and availability of business-critical systems and infrastructure using metrics to identify trends and potential issues. You will also work closely with...


  • Charlotte, United States Regions Bank Full time

    Thank you for your interest in a career at Regions. At Regions, we believe associates deserve more than just a job. We believe in offering performance-driven individuals a place where they can build a career --- a place to expect more opportunities. If you are focused on results, dedicated to quality, strength and integrity, and possess the drive to succeed,...


  • Charlotte, United States JobRialto Full time

    Job Description: Looking for a forward-thinking, energetic Site Reliability Engineering Manager to join our team. PDL serves the ecommerce needs of leading and growing grocery retailers with millions of shoppers located throughout the East Coast and Midwest. PDL strives to enable our retailers to be number one in all markets they operate in by: Leading IT...


  • Charlotte, United States JobRialto Full time

    Job Description: Looking for a forward-thinking, energetic Site Reliability Engineering Manager to join our team. PDL serves the ecommerce needs of leading and growing grocery retailers with millions of shoppers located throughout the East Coast and Midwest. PDL strives to enable our retailers to be number one in all markets they operate in by: Leading IT...


  • Charlotte, United States Brightspeed Full time

    Job Description We are currently looking for a Principal Site Reliability Engineer to join our growing team. In this role, you will implement and maintain monitoring systems to track the performance and availability of business-critical systems and infrastructure using metrics to identify trends and potential issues. You will also work closely with...


  • Charlotte, United States Delta Air Lines Full time

    United States, Georgia, Atlanta Information Technology 04-May-2024 Ref #: 24745 How you'll help us Keep Climbing (overview & key responsibilities) Delta IT is on a journey of transformation. We are changing the way we do business from top to bottom. As thought-leaders within Delta, we strive to create significant and innovative solutions and are looking...


  • Charlotte, North Carolina, United States Brightspeed Full time

    Job DescriptionWe are currently looking for a Principal Site Reliability Engineer to join our growing team. In this role, you will implement and maintain monitoring systems to track the performance and availability of business-critical systems and infrastructure using metrics to identify trends and potential issues. You will also work closely with...


  • Charlotte, United States Brightspeed Full time

    Job DescriptionJob DescriptionCompany DescriptionAt Brightspeed, we are reimagining how people live, work, play and connect by providing fast, reliable internet connections and an awesome customer experience in twenty states throughout the Midwest and South.Backed by funds managed by Apollo Global Management, our vision is to accelerate the upgrade of...


  • Charlotte, United States Syntricate Technologies Full time

    Platform/Site Reliability Engineer 6 Months Contract to Hire Charlotte, NCJOB DESCRIPTION We're looking for a Senior Platform Engineer to come help us automate everything, enable our developer teammates, and create and support world-class platforms. As a Senior Platform Engineer, you will be an integral member of the Platform Engineering team, helping the...


  • Charlotte, United States Saxon Global Full time

    Site Reliability Engineer JOB SUMMARY This position is responsible for design, development and implementation of cloud based technologies. Provide technical expertise on complex projects and advanced troubleshooting of existing Cloud technology for use by department. Such as guidance and support in the development of progress at all system layers, including...


  • Charlotte, United States SERC Reliability Corporation Full time

    SERC OVERVIEW: The electric grid is vital to our everyday lives. It is fundamental for the health, safety, and well-being of our communities, and provides the platform for our economy and our societal and technological advances. SERC's mission is to reduce risks to the reliability and security of the electric grid (also known as the bulk power system), not...


  • Charlotte, United States SERC Reliability Corporation Full time

    Job DescriptionJob DescriptionSERC OVERVIEW:The electric grid is vital to our everyday lives. It is fundamental for the health, safety, and well-being of our communities, and provides the platform for our economy and our societal and technological advances. SERC's mission is to reduce risks to the reliability and security of the electric grid (also known...


  • Charlotte, United States Recurring Decimal Full time

    Location- Hybrid | Charlotte, NC or Phoenix, AZKey Skills:Experience with one or more Cloud Platforms (Azure, GCP)Experience with Container technologies: Kubernetes, Docker, PKS, Azure Kubernetes Service (AKS)5+ years of experience in Site Reliability engineeringExperience setting up monitoring in applications and database.Experience in ServiceNow, Jira,...


  • Charlotte, United States Recurring Decimal Full time

    Location- Hybrid | Charlotte, NC or Phoenix, AZKey Skills:Experience with one or more Cloud Platforms (Azure, GCP)Experience with Container technologies: Kubernetes, Docker, PKS, Azure Kubernetes Service (AKS)5+ years of experience in Site Reliability engineeringExperience setting up monitoring in applications and database.Experience in ServiceNow, Jira,...


  • Charlotte, United States Cedent Consulting Full time

    Site Reliability Engineer (Charlotte, NC) Role: Site Reliability Engineer Location: Charlotte, NC Client: Healthcare client Position Responsibilities: Code strategies and languages by leveraging knowledge while working with customers on configuration management initiatives. Coordinate and assist teams in building competencies with infrastructure using object...


  • Charlotte, United States Credit Karma Full time

    Intuit Credit Karma is a mission-driven company, focused on championing financial progress for our more than 130 million members globally. While we're best known for pioneering free credit scores, our members turn to us for everything related to their financial goals, including identity monitoring, applying for credit cards, shopping for insurance and loans...


  • Charlotte, United States Credit Karma Full time

    Intuit Credit Karma is a mission-driven company, focused on championing financial progress for our more than 130 million members globally. While we're best known for pioneering free credit scores, our members turn to us for everything related to their financial goals, including identity monitoring, applying for credit cards, shopping for insurance and loans...


  • Charlotte, United States Cedent Consulting Full time

    Site Reliability Engineer (Charlotte, NC) Role: Site Reliability Engineer Location: Charlotte, NC Client: Healthcare client Position Responsibilities: Code strategies and languages by leveraging knowledge while working with customers on configuration management initiatives. Coordinate and assist teams in building competencies with infrastructure using object...

  • Reliability Engineer

    4 weeks ago


    Charlotte, United States JLL Full time

    JLL is seeking a Reliability Engineer to join our team! This exciting opportunity is responsible for providing reliability engineering support for operations and maintenance of buildings, infrastructure, and equipment assets. In coordination and full collaboration with the Engineering Services Reliability & Asset Management COE, the Reliability Engineer is...

  • Digital One

    7 days ago


    Charlotte, United States Jobs for Humanity Full time

    Job Description A variety of soft skills and experience may be required for the following role Please ensure you check the overview below carefully.Position Type :Full time Type Of Hire :Experienced (relevant combo of work and education) Education Desired :Bachelor of Computer Science Travel Percentage :5 - 10%Job DescriptionAs the world works and lives...