Site Reliability Engineer

4 weeks ago


Charlotte, United States Bank of America Corporation Full time
Job Description

At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day.

One of the keys to driving Responsible Growth is being a great place to work for our teammates around the world. We're devoted to being a diverse and inclusive workplace for everyone. We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical, emotional, and financial well-being.

Bank of America believes both in the importance of working together and offering flexibility to our employees. We use a multi-faceted approach for flexibility, depending on the various roles in our organization.

Working at Bank of America will give you a great career with opportunities to learn, grow and make an impact, along with the power to make a difference. Join us

About Bank of America - Global Technology:

Global Technology delivers technology services globally across the bank's eight lines of business that serve individuals, companies, and institutions. The team also focuses on digital banking, payments, infrastructure, data management and technology that enhances cyber security, and risk and capital management. Innovation is at the heart of all Global Technology does.

Enterprise Cloud Platforms Team

Enterprise Cloud Platforms team in the CTO organization offers Private and Public Cloud platforms for Bank of America's developers to drive faster time-to-market, innovation with private and public cloud capabilities, and reduce complexity with bult-in integrations. We believe in high quality engineering culture to engineer our platforms with customer and platform mindset, design for large enterprise scale and resilience, and accelerate market innovation into the technical platforms we deliver.

As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.

We are seeking an experienced Site Reliability Engineer (SRE) to support and administration of our Hybrid Cloud Container (OpenShift/AKS) platform.

Our HCCP Service Reliability Engineers (SRE) ensure that our Platform meets the reliability and uptime requirements of our demanding enterprise customers. This is achieved with, the best engineering practices and resilient design and through a well-defined and effective global on-call rotation that runs 24x7.

The role provides opportunity to work with wide range of technologies and unique perspective on how various services (on-prem/off-prem) interact with each other. You will work with colleagues that are as smart, hardworking, and driven as you. You will get an opportunity to work in a team that keeps growing, innovating, and giving you room to be proactive and creative.

Are you ready for the next step in your career? Then we'd love to hear from you

Position Summary

  • Responsible for reliability and support of Container Platform on-prem and external clouds (Azure/AWS/Google)
  • Monitor and troubleshoot Container platform (Openshift) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc.
  • Perform deep dives into systemic and latent reliability issues, Incident management, problem management
  • Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues.
  • Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
  • Responsible for application onboarding and provide troubleshooting support through the lifecycle of the applications on the container platform.
  • Identify and drive opportunities to improve automation to reduce TOIL and improve operational excellence.
  • Partner with risk, and compliance teams to bring visibility and implement right controls and remediation of vulnerabilities.
  • Ensure resiliency during implementation and identify/fix resiliency problems by collaborating with engineering teams.
  • Be a key stakeholder in the design of cloud services and work with Architecture, engineering, product teams
  • Participate in 24x7 on-call coverage follow the Sun model

Required Skills

  • BS/MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
  • Minimum 5+ years of hands-on experience supporting Kubernetes/Openshift/AKS/EKS Container platform.
  • Experience with Python, Ansible, Golang, and Shell scripting
  • Strong experience in major services related to Compute, Storage, Network and Security
  • Experience with monitoring tools like Prometheus and Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics
  • Strong understanding and background of working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and Ping Identity or other SSO solutions.
  • Advanced knowledge of Linux OS, DNS, DHCP, Kerberos and Windows Authentication
  • Experience with CI/CD tools git/Jenkins, GitOps model
  • Excellent understanding of Linux/Windows operating systems administration
  • Experience in Container security and vulnerability remediation.
  • Systematic problem-solving approach, sense of ownership and drive
  • Ability to juggle competing priorities and adapt to changes in project scope.
  • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
  • Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.

.

Desired Skills

  • Experience in Openshift, CSP Kubernetes services such as AKS and EKS
  • Kubernetes/Openshift/Terraform certifications are a plus
  • Experience in Terraform, ArgoCD, Tekton, and K-native technologies.
  • Experience in agile deployment methodologies (GitOps)
  • Knowledge of various container runtimes
  • Familiarity with the operator deployment pattern.
  • Experience working in a highly available multi-datacenter environment
  • Experience working with monitoring tools such as Prometheus, Splunk, Dynatrace, Sysdig, or similar tools.
  • Understanding of cost management, inventory management, FinOps model
Shift

1st shift (United States of America)

Hours Per Week

40



  • Charlotte, United States Recurring Decimal Full time

    Site Reliability Engineer Location- Hybrid | Charlotte, NC or Phoenix, AZ Key Skills: Experience with one or more Cloud Platforms (Azure, GCP) Experience with Container technologies: Kubernetes, Docker, PKS, Azure K...


  • Charlotte, United States JobRialto Full time

    Job Description: Looking for a forward-thinking, energetic Site Reliability Engineering Manager to join our team. PDL serves the ecommerce needs of leading and growing grocery retailers with millions of shoppers located throughout the East Coast and Midwest. PDL strives to enable our retailers to be number one in all markets they operate in by: Leading IT...


  • Charlotte, United States JobRialto Full time

    Job Description: Looking for a forward-thinking, energetic Site Reliability Engineering Manager to join our team. PDL serves the ecommerce needs of leading and growing grocery retailers with millions of shoppers located throughout the East Coast and Midwest. PDL strives to enable our retailers to be number one in all markets they operate in by: Leading IT...


  • Charlotte, United States KTek Resourcing Full time

    Role: Site Reliability Engineer With SplunkLocation: Charlotte, NC (Onsite-Hybrid)Duration: Contract/Full-timeJob Description:Candidates who have expertise in creating Splunk dashboards.Also Grafana and AppDynamics experience. It should be based preferably in Charlotte (CIC building), Willing to work during non-normal hours for deployments and any Prod...


  • Charlotte, North Carolina, United States KTek Resourcing Full time

    Role: Site Reliability Engineer With SplunkLocation: Charlotte, NC (Onsite-Hybrid)Duration: Contract/Full-timeJob Description:Candidates who have expertise in creating Splunk dashboards.Also Grafana and AppDynamics experience. It should be based preferably in Charlotte (CIC building), Willing to work during non-normal hours for deployments and any Prod...


  • Charlotte, United States KTek Resourcing Full time

    Role: Site Reliability Engineer With SplunkLocation: Charlotte, NC (Onsite-Hybrid)Duration: Contract/Full-timeJob Description:Candidates who have expertise in creating Splunk dashboards.Also Grafana and AppDynamics experience. It should be based preferably in Charlotte (CIC building), Willing to work during non-normal hours for deployments and any Prod...


  • Charlotte, United States Ryan Consulting Group Full time

    Job DescriptionJob DescriptionThe Site Reliability Engineer is a key role which focuses on building and maintaining the tooling and infrastructure used to automate the release, deployment, and upgrade processes for workloads. This individual will work on developing the automated pipelines for cloud environments as well as providing consulting services to...


  • Charlotte, North Carolina, United States SERC Reliability Corporation Full time

    SERC OVERVIEW:SERC Reliability Corporation (SERC) is a nonprofit regulatory authority and is one of the six Regional Entities across North America and is responsible for administering the bulk power system (BPS) reliability in all or part of the sixteen southeastern states under the Federal Energy Regulatory Commission (FERC) approved delegation agreement...


  • Charlotte, United States Dell Full time

    Senior Engineer Site Reliability Dell Technologies customers rely on our products and services to drive progress. So, we take the service we provide extremely seriously. Service Delivery is all about making sure our technical solutions help clients fulfil their priorities, challenges and initiatives. As trusted advisors, we build in-depth knowledge of what...


  • Charlotte, United States Syntricate Technologies Full time

    Platform/Site Reliability Engineer 6 Months Contract to Hire Charlotte, NCJOB DESCRIPTION We're looking for a Senior Platform Engineer to come help us automate everything, enable our developer teammates, and create and support world-class platforms. As a Senior Platform Engineer, you will be an integral member of the Platform Engineering team, helping the...


  • Charlotte, United States Syntricate Technologies Full time

    Platform/Site Reliability Engineer 6 Months Contract to Hire Charlotte, NCJOB DESCRIPTION We're looking for a Senior Platform Engineer to come help us automate everything, enable our developer teammates, and create and support world-class platforms. As a Senior Platform Engineer, you will be an integral member of the Platform Engineering team, helping the...


  • Charlotte, United States Saxon Global Full time

    Site Reliability Engineer JOB SUMMARY This position is responsible for design, development and implementation of cloud based technologies. Provide technical expertise on complex projects and advanced troubleshooting of existing Cloud technology for use by department. Such as guidance and support in the development of progress at all system layers, including...


  • Charlotte, United States Saxon Global Full time

    Site Reliability Engineer JOB SUMMARY This position is responsible for design, development and implementation of cloud based technologies. Provide technical expertise on complex projects and advanced troubleshooting of existing Cloud technology for use by department. Such as guidance and support in the development of progress at all system layers, including...


  • Charlotte, United States Dell Full time

    Senior Engineer Site ReliabilityDell Technologies customers rely on our products and services to drive progress. So, we take the service we provide extremely seriously. Service Delivery is all about making sure our technical solutions help clients fulfil their priorities, challenges and initiatives. As trusted advisors, we build in-depth knowledge of what...


  • Charlotte, United States Syntricate Technologies Inc Full time

    Platform/Site Reliability Engineer 6 Months Contract to Hire Charlotte, NC JOB DESCRIPTION We're looking for a Senior Platform Engineer to come help us automate everything, enable our developer teammates, and create and support world-class platforms. As a Senior Platform Engineer, you will be an integral member of the Platform Engineering team, helping...


  • Charlotte, United States SERC Reliability Corporation Full time

    Job DescriptionJob DescriptionSERC OVERVIEW:The electric grid is vital to our everyday lives. It is fundamental for the health, safety, and well-being of our communities, and provides the platform for our economy and our societal and technological advances. SERC's mission is to reduce risks to the reliability and security of the electric grid (also known...


  • Charlotte, United States SERC Reliability Corporation Full time

    SERC OVERVIEW: The electric grid is vital to our everyday lives. It is fundamental for the health, safety, and well-being of our communities, and provides the platform for our economy and our societal and technological advances. SERC's mission is to reduce risks to the reliability and security of the electric grid (also known as the bulk power system), not...


  • Charlotte, United States SERC Reliability Corporation Full time

    SERC OVERVIEW: The electric grid is vital to our everyday lives. It is fundamental for the health, safety, and well-being of our communities, and provides the platform for our economy and our societal and technological advances. SERC's mission is to reduce risks to the reliability and security of the electric grid (also known as the bulk power system), not...


  • Charlotte, United States SERC Reliability Corporation Full time

    Job DescriptionJob DescriptionSERC OVERVIEW:The electric grid is vital to our everyday lives. It is fundamental for the health, safety, and well-being of our communities, and provides the platform for our economy and our societal and technological advances. SERC's mission is to reduce risks to the reliability and security of the electric grid (also known...


  • Charlotte, United States U.S. Bancorp Full time

    As a Reliability Engineer, your role will be a combination of supporting production applications and proactively looking for ways to automate your discoveries, eliminate incidents from recurring and/or reduce the time it takes to get our customers ba Reliability Engineer, Liability, Reliability, Infrastructure, Reliability, Engineer, Manufacturing, Banking