Site Reliability Engineer

3 weeks ago


Richmond, United States Bank of America Full time

Job Description: At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day.

One of the keys to driving Responsible Growth is being a great place to work for our teammates around the world. We’re devoted to being a diverse and inclusive workplace for everyone. We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical, emotional, and financial well-being.

Bank of America believes both in the importance of working together and offering flexibility to our employees. We use a multi-faceted approach for flexibility, depending on the various roles in our organization.

Working at Bank of America will give you a great career with opportunities to learn, grow and make an impact, along with the power to make a difference. Join us About Bank of America – Global Technology: Global Technology delivers technology services globally across the bank’s eight lines of business that serve individuals, companies, and institutions. The team also focuses on digital banking, payments, infrastructure, data management and technology that enhances cyber security, and risk and capital management. Innovation is at the heart of all Global Technology does. Enterprise Cloud Platforms Team: Enterprise Cloud Platforms team in the CTO organization offers Private and Public Cloud platforms for Bank of America’s developers to drive faster time-to-market, innovation with private and public cloud capabilities, and reduce complexity with bult-in integrations. We believe in high quality engineering culture to engineer our platforms with customer and platform mindset, design for large enterprise scale and resilience, and accelerate market innovation into the technical platforms we deliver. As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company. We are seeking an experienced Site Reliability Engineer (SRE) to support and administration of our Hybrid Cloud Container (OpenShift /AKS) platform. Our HCCP Service Reliability Engineers (SRE) ensure that our Platform meets the reliability and uptime requirements of our demanding enterprise customers. This is achieved with, the best engineering practices and resilient design and through a well-defined and effective global on-call rotation that runs 24x7. The role provides opportunity to work with wide range of technologies and unique perspective on how various services (on-prem/off-prem) interact with each other. You will work with colleagues that are as smart, hardworking, and driven as you. You will get an opportunity to work in a team that keeps growing, innovating, and giving you room to be proactive and creative. Are you ready for the next step in your career? Then we’d love to hear from you

Position Summary Responsible for reliability and support of Container Platform on-prem and external clouds (Azure /AWS /Google) Monitor and troubleshoot Container platform (Openshift) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc. Perform deep dives into systemic and latent reliability issues, Incident management, problem management Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues. Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes. Responsible for application onboarding and provide troubleshooting support through the lifecycle of the applications on the container platform. Identify and drive opportunities to improve automation to reduce TOIL and improve operational excellence. Partner with risk, and compliance teams to bring visibility and implement right controls and remediation of vulnerabilities. Ensure resiliency during implementation and identify/fix resiliency problems by collaborating with engineering teams. Be a key stakeholder in the design of cloud services and work with Architecture, engineering, product teams Participate in 24x7 on-call coverage follow the sun model Required Skills BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience. Minimum 5+ years of hands-on experience supporting Kubernetes /Openshift / AKS /EKS Container platform. Experience with Python, Ansible, Golang, and shell scripting Strong experience in major services related to Compute, Storage, Network and Security Experience with monitoring tools like Prometheus and Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics Strong understanding and background of working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and Ping Identity or other SSO solutions. Advanced knowledge of Linux OS, DNS, DHCP, Kerberos and Windows Authentication Experience with CI/CD tools git /Jenkins, GitOps model Excellent understanding of Linux /Windows operating systems administration Experience in Container security and vulnerability remediation. Systematic problem-solving approach, sense of ownership and drive Ability to juggle competing priorities and adapt to changes in project scope. Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must. Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities. . Desired Skills Experience in Openshift, CSP Kubernetes services such as AKS and EKS Kubernetes /Openshift /Terraform certifications are a plus Experience in Terraform, ArgoCD, Tekton, and K-native technologies. Experience in agile deployment methodologies (GitOps) Knowledge of various container runtimes Familiarity with the operator deployment pattern. Experience working in a highly available multi-datacenter environment Experience working with monitoring tools such as Prometheus, Splunk, Dynatrace, Sysdig, or similar tools. Understanding of cost management, inventory management, FinOps model Shift: 1st shift (United States of America)

Hours Per Week: 40 #J-18808-Ljbffr



  • Richmond, United States Nucleusteq Full time

    JD - Site Reliability Engineer Location: Richmond, VADuration: 4 months Description:Client’s Enterprise Data Machine Learning (EDML) employs innovative minds like yourself to design and develop software-systems that can meet the demand of our ever-growing customer base.Like a startup inside an enterprise, EDML focuses on using a customer-centric approach...


  • Richmond, United States Nucleusteq Full time

    JD - Site Reliability Engineer Location: Richmond, VADuration: 4 months Description: Client’s Enterprise Data Machine Learning (EDML) employs innovative minds like yourself to design and develop software-systems that can meet the demand of our ever-growing customer base.Like a startup inside an enterprise, EDML focuses on using a customer-centric approach...


  • Richmond, United States Gridiron IT Full time

    GridIron IT is seeking 2 Senior Site Reliability Engineers local to Langley, AFB. Active Secret Clearance Required The Site Reliability Engineer (SRE) shall be able to build and maintain infrastructure as code on large scale multi-site deployments. The SRE shall utilize their experience to evaluate and assess new ways to scale platform capabilities. The...


  • Richmond, United States CoreWeave Full time

    CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry’s fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80%...


  • Richmond, United States NVIDIA Full time

    NVIDIA is looking for a Site Reliability Engineer (SRE) to join its Networking Support team. As an SRE at NVIDIA you will ensure that our customers production environments have reliability and uptime. We are seeking an SRE with a mentality and methodology of how maintain, monitor and troubleshoot DC networking equipment. SRE's culture of diversity,...

  • Engineer I, II, III

    3 weeks ago


    Richmond, United States Dominion Energy Full time

    Engineer I, II, III - Reliability Engineer (On-site) At Dominion Energy we love our jobs. Thats right. Love. Every day we go to work filled with passion to be excellent, to creatively problem solve and to innovate. These are exciting days for energy companies, and Dominion Energy aims to shape the future of energy in America. We are looking at all of our...

  • Engineer I, II, III

    3 weeks ago


    Richmond, United States National Guard Employment Network Full time

    Job Description ATTENTION MILITARY AFFILIATED JOB SEEKERS - Our organization works with partner companies to source qualified talent for their open roles. The following position is available to Veterans, Transitioning Military, National Guard and Reserve Members, Military Spouses, Wounded Warriors, and their Caregivers. If you have the required skill set,...


  • Richmond, Virginia, United States CarMax Full time

    8116 - Midtown Office W. Broad Street, Richmond, Virginia, 23220CarMax, the way your career should be Who we are looking for:The Senior Technology Manager's primary responsibility is to partner with their business and technology peers to provide solutions and services that help deliver CarMax's strategic mission and plans. This position will direct and...


  • Richmond, Virginia, United States CarMax Full time

    8116 - Midtown Office W. Broad Street, Richmond, Virginia, 23220CarMax, the way your career should be Who we are looking for:The Senior Technology Manager's primary responsibility is to partner with their business and technology peers to provide solutions and services that help deliver CarMax's strategic mission and plans. This position will direct and...

  • Engineer I, II, III

    3 weeks ago


    Richmond, United States VetJobs Full time

    Job Description ATTENTION MILITARY AFFILIATED JOB SEEKERS - Our organization works with partner companies to source qualified talent for their open roles. The following position is available to Veterans, Transitioning Military, National Guard and Reserve Members, Military Spouses, Wounded Warriors, and their Caregivers. If you have the required skill set,...


  • Richmond, United States KBR Full time

    KBR is in search of a skilled Site Facilities Engineer to lead overall operations, maintenance, and performance of our government customer’s sites. You will manage a team of around 60 specialists, focusing on operational effectiveness, maintenance, safety, and environmental compliance. The ideal candidate will demonstrate strong communication skills,...

  • Site Engineer

    4 weeks ago


    Richmond, United States AMG Services Full time

    COMPANY OVERVIEW: AMG Inc. is a full-service engineering company who's been in business for more than 42 years. We provide services to industrial clients nationwide. We service many industries including Food Ingredient Processing, Agricultural Commodities Processing, Chemicals, Plastics, Biotech and Minerals, to name a few. Come be a part of the AMG Inc....

  • Site Engineer

    2 weeks ago


    Richmond, United States AMG Services Full time

    COMPANY OVERVIEW: AMG Inc. is a full-service engineering company who's been in business for more than 42 years. We provide services to industrial clients nationwide. We service many industries including Food Ingredient Processing, Agricultural Commodities Processing, Chemicals, Plastics, Biotech and Minerals, to name a few. Come be a part of the AMG Inc....


  • Richmond, United States Channel Personnel Services Inc Full time

    Job Description Job Description The Electrical Reliability Engineer II is responsible for electrical reliability projects supporting the Maintenance and Reliability organization at the Chesterfield Plant by applying Reliability Engineering principles, statistical data analysis and supporting work process. This is a fast-paced position that must thrive in a...


  • Richmond, United States Avature Full time

    WestRock (NYSE :WRK) is a global leader in sustainable paper and packaging solutions. We are materials scientists, packaging designers, mechanical engineers and manufacturing experts with a shared purpose: Innovate Boldly. Package Sustainably. Guided by our values of integrity, respect, accountability and excellence, we use leading science and technology to...

  • Site Inspector

    1 week ago


    Richmond, United States Wallace Montgomery Full time

    **WHY YOU SHOULD JOIN OUR TEAM** **ABOUT WALLACE MONTGOMERY** Since 1975, our multi-disciplined engineering organization has grown to become a recognized leader in planning, engineering, and construction management. As an Engineering News-Record (ENR) Top 500 design firm, our staff of professional engineers, planners, surveyors, technicians, construction...


  • Richmond, Virginia, United States KBR Full time

    Title:Site Team Lead, Engineering and DesignKBR Sustainable Technology Solutions (STS) provides holistic and value-added solutions across the entire asset life cycle. These include world-class licensed process technologies, differentiated advisory services, deep technical domain expertise, energy transition solutions, high-end design capabilities, and smart...


  • Richmond, Virginia, United States KBR Full time

    Title:Site Team Lead, Engineering and DesignKBR Sustainable Technology Solutions (STS) provides holistic and value-added solutions across the entire asset life cycle. These include world-class licensed process technologies, differentiated advisory services, deep technical domain expertise, energy transition solutions, high-end design capabilities, and smart...


  • Richmond, Virginia, United States DuPont Full time

    At DuPont, we are working on things that matter; whether it's providing clean water to more than a billion people on the planet, producing materials that are essential in everyday technology devices from smartphones to electric vehicles, or protecting workers around the world. If you would like to be a part of a premier multi-industrial company that is...


  • Richmond, Virginia, United States DuPont Full time

    At DuPont, we are working on things that matter; whether it's providing clean water to more than a billion people on the planet, producing materials that are essential in everyday technology devices from smartphones to electric vehicles, or protecting workers around the world. If you would like to be a part of a premier multi-industrial company that is...