Current jobs related to Cloud Senior Site Reliability Engineer - Atlanta - Bank of America


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Job Title: Senior Site Reliability Engineer - Cloud ExpertJob Summary:Motion Recruitment is seeking a highly skilled Senior Site Reliability Engineer - Cloud Expert to join our client's team. As a key member of the infrastructure team, you will be responsible for designing, implementing, and maintaining scalable and highly available cloud infrastructure on...


  • Atlanta, Georgia, United States Diversity Resource Staffing Inc Full time

    Job SummaryDiversity Resource Staffing Inc is seeking a highly skilled Senior Site Reliability Engineer to join our Consumer SRE Team. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the security, resilience, scalability, and maintainability of our services for mortgage borrowers and lenders.About the RoleAs a Senior Site...


  • Atlanta, Georgia, United States Diversity Resource Staffing Inc Full time

    Job SummaryDiversity Resource Staffing Inc is seeking a highly skilled Senior Site Reliability Engineer to join our Consumer SRE Team. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the security, resilience, scalability, and maintainability of our services for mortgage borrowers and lenders.About the RoleAs a Senior Site...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Job Title: Senior Site Reliability Engineer IIAt Motion Recruitment, we are seeking a highly skilled Senior Site Reliability Engineer II to join our team. As a key member of our SRE/Platform team, you will be responsible for ensuring the reliability and scalability of our SaaS-based AI/ML product.About the Role:Work closely with the SRE/Platform team to...


  • Atlanta, Georgia, United States Diversity Resource Staffing Inc Full time

    Senior Site Reliability EngineerThis is an exciting opportunity for a skilled Senior Site Reliability Engineer to join our Consumer SRE Team at IMT division, providing secure, resilient, scalable, and maintainable services for mortgage borrowers and lenders. Our client, a division of a leading financial services company, operates numerous financial and...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Job Title: Senior Site Reliability EngineerJob Type: Full-timeLocation: Atlanta, GeorgiaJob Description:A leading healthcare and software company in Atlanta, Georgia, is seeking a highly skilled Senior Site Reliability Engineer to join its team. The company specializes in cancer treatments and best practices for chemotherapy, aiming to provide the most...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Job Title: Senior Cloud Reliability EngineerJob Type: Full-timeLocation: Atlanta, GAJob Description:We are seeking a highly skilled Senior Cloud Reliability Engineer to join our team at Motion Recruitment. As a Senior Cloud Reliability Engineer, you will be responsible for designing, implementing, and maintaining the company's cloud infrastructure, ensuring...


  • Atlanta, Georgia, United States Ultimate Software Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Ultimate Software. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and efficiency of our cloud-based services.Key ResponsibilitiesDesign and implement scalable and reliable cloud infrastructure solutionsDevelop and maintain...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team in Atlanta, Georgia. As a key member of our infrastructure team, you will be responsible for ensuring the reliability and scalability of our cloud-based platform.Key ResponsibilitiesDesign and implement scalable and reliable cloud infrastructure using AWS and...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team in Atlanta, Georgia. As a key member of our infrastructure team, you will be responsible for ensuring the reliability and scalability of our cloud-based platform.Key ResponsibilitiesDesign and implement scalable and reliable cloud infrastructure using AWS and...


  • Atlanta, Georgia, United States Ultimate Software Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Ultimate Software. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and efficiency of our cloud-based services.Key ResponsibilitiesDesign and implement scalable and reliable cloud infrastructure solutionsDevelop and maintain...


  • Atlanta, Georgia, United States STORD Full time

    About StordStord is a leading commerce enablement provider of fulfillment services and technology that powers seamless checkout and delivery experiences for high-volume mid-market and enterprise brands across all channels.Job DescriptionWe are seeking a mission-driven Senior Site Reliability Engineer to be a driving force behind an exceptionally resilient,...


  • Atlanta, Georgia, United States PagerDuty Full time

    About the RolePagerDuty is seeking a highly skilled Senior Site Reliability Engineer to join our SRE-Platform team. As a key contributor, you will be responsible for building, maintaining, and scaling the Kubernetes platform that powers our operations.Key ResponsibilitiesMaintain the overall health of the platform, including triaging and troubleshooting...


  • Atlanta, Georgia, United States Microsoft Corporation Full time

    Job DescriptionMicrosoft Corporation is seeking a highly skilled Senior Cloud Reliability Engineer to join our Cloud+Artificial Intelligence (C+AI) Silver SQL Team. This team is responsible for deploying and operating the Azure SQL family of services within Azure Government clouds.In this role, you will have the opportunity to work with engineers who enable...


  • Atlanta, Georgia, United States Microsoft Corporation Full time

    Job DescriptionMicrosoft Corporation is seeking a highly skilled Senior Cloud Reliability Engineer to join our Cloud+Artificial Intelligence (C+AI) Silver SQL Team. This team is responsible for deploying and operating the Azure SQL family of services within Azure Government clouds.In this role, you will have the opportunity to work with engineers who enable...


  • Atlanta, Georgia, United States Highbrow Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Highbrow. As a key member of our infrastructure team, you will play a critical role in ensuring the reliability, scalability, and performance of our systems.Key ResponsibilitiesCollaboration and CommunicationWork closely with our Application and DevOps teams to ensure...


  • Atlanta, United States Featurespace Full time

    The Opportunity As our Senior Site Reliability Engineer,you will help us achieve our goals and deliver success on behalf of our customers by operating Featurespace's world leading product, ARIC Risk Hub, as a robust cloud-based SaaS solution. In addition to this, you will work on continuously improving our SaaS offering's features and robustness. This is an...


  • Atlanta, Georgia, United States PagerDuty Full time

    About the RolePagerDuty is seeking a highly skilled Senior Site Reliability Engineer to join our SRE-Platform team. As a key contributor, you will be responsible for building, maintaining, and scaling the Kubernetes platform that powers PagerDuty.Key ResponsibilitiesEnsure the overall health of the platform, including triaging and troubleshooting production...


  • Atlanta, Georgia, United States Datum Technologies Group Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Datum Technologies Group. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Implement and improve monitoring, alerting,...


  • Atlanta, United States Sage Full time

    Sage Senior Site Reliability Engineer - Atlanta, Georgia Sage is searching for an enthusiastic Senior Site Reliability Engineer to support our global SaaS platform for ecommerce retailers, ensuring our products and services are constantly available for our customers to run their businesses. In this role, you will bridge the technical gaps between Development...

Cloud Senior Site Reliability Engineer

4 months ago


Atlanta, United States Bank of America Full time

Job Description:

At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection.  Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day.

One of the keys to driving Responsible Growth is being a great place to work for our teammates around the world. We’re devoted to being a diverse and inclusive workplace for everyone. We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical, emotional, and financial well-being.

Bank of America believes both in the importance of working together and offering flexibility to our employees. We use a multi-faceted approach for flexibility, depending on the various roles in our organization.

Working at Bank of America will give you a great career with opportunities to learn, grow and make an impact, along with the power to make a difference. Join us

Senior Site Reliability Engineering, Hybrid Cloud Container Platform, Enterprise Cloud Platforms

About Bank of America – Global Technology:

Global Technology delivers technology services globally across the bank’s eight lines of business that serve individuals, companies, and institutions. The team also focuses on digital banking, payments, infrastructure, data management and technology that enhances cyber security, and risk and capital management. Innovation is at the heart of all Global Technology does.

Enterprise Cloud Platforms Team:

Enterprise Cloud Platforms team in the CTO organization offers Private and Public Cloud platforms for Bank of America’s developers to drive faster time-to-market, innovation with private and public cloud capabilities, and reduce complexity with bult-in integrations. We believe in high quality engineering culture to engineer our platforms with customer and platform mindset, design for large enterprise scale and resilience, and accelerate market innovation into the technical platforms we deliver.

As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.

We are seeking an experienced Senior Cloud Site Reliability Engineer (SRE) to support and administration of our Hybrid Cloud Container (OpenShift /AKS) platform.

Our Cloud Service Reliability Engineers (cSREs) ensure that our Cloud services meet the reliability and uptime requirements of our demanding enterprise customers. This is achieved with, the best engineering practices and resilient design and through a well-defined and effective global on-call rotation that runs 24x7.

The role provides opportunity to work with wide range of technologies and unique perspective on how various services (on-prem/off-prem) interact with each other. You will work with colleagues that are as smart, hardworking, and driven as you. You will get an opportunity to work in a team that keeps growing, innovating, and giving you room to be proactive and creative.

Are you ready for the next step in your career? Then we’d love to hear from you

Position Summary:

  • Responsible for reliability and support of Container PaaS Platform on-prem/off-prem (Azure /AWS /Google)

  • Monitor and troubleshoot Container PaaS platform (Openshift) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc.

  • Perform deep dives into systemic and latent reliability issues, Incident management, problem management

  • Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues.

  • Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.

  • Identify and drive opportunities to improve automation for the PaaS services; scope and create automation for deployment, management, and visibility of our services.

  • Evaluating and automating the scaling and capacity requirements within PaaS environments

  • Partner with risk, and compliance teams to bring visibility and implement right controls and policies in the PaaS Platform

  • Ensure resiliency during implementation and identify/fix resiliency problems by collaborating with engineering teams

  • Be a key stakeholder in the design of cloud services and work with Architecture, engineering, product teams

  • Participate in 24x7 on-call coverage follow the sun model

Required Skills:

  • BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.

  • Minimum 8+ years of hands-on experience supporting Kubernetes /Openshift / Container PaaS platform

  • Experience with Python, Ansible and shell scripting

  • Kubernetes /Openshift /Terraform certifications are a plus

  • Strong experience in major services related to Compute, Storage, Network and Security

  • Experience with monitoring tools like Prometheus and Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics

  • Strong understanding and background of working with a complex Active Directory and IAM controls

  • Advanced knowledge of DNS, DHCP, Kerberos and Windows Authentication

  • Experience with CI/CD tools git /Jenkins, GitOps model

  • Excellent understanding of Linux /Windows operating systems administration

  • Systematic problem-solving approach, sense of ownership and drive

  • Ability to juggle competing priorities and adapt to changes in project scope.

  • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.

  • Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.

Desired Job Skills:

  • Experience in Openshift, managed Kubernetes services such as AKS, EKS, or GKE

  • Experience in Terraform, ArgoCD, Tekton, and K-native technologies

  • Experience in agile deployment methodologies (GitOps)

  • Knowledge of various container runtimes

  • Familiarity with the operator deployment pattern.

  • Experience working in a highly available multi-datacenter environment

  • Experience working with monitoring tools such as Prometheus, Splunk, Dynatrace, Sysdig, or similar tools.

  • Understanding of cost management, inventory management, FinOps model

Shift:

1st shift (United States of America)

Hours Per Week:

40