No more applications are being accepted for this job

Cloud Senior Site Reliability Engineer

3 weeks ago

Jersey City, United States Bank of America Full time

Job Description:

At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day.

One of the keys to driving Responsible Growth is being a great place to work for our teammates around the world. We're devoted to being a diverse and inclusive workplace for everyone. We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical, emotional, and financial well-being.

Bank of America believes both in the importance of working together and offering flexibility to our employees. We use a multi-faceted approach for flexibility, depending on the various roles in our organization.

Working at Bank of America will give you a great career with opportunities to learn, grow and make an impact, along with the power to make a difference. Join us

Senior Site Reliability Engineering, Hybrid Cloud Container Platform, Enterprise Cloud Platforms

About Bank of America - Global Technology:

Global Technology delivers technology services globally across the bank's eight lines of business that serve individuals, companies, and institutions. The team also focuses on digital banking, payments, infrastructure, data management and technology that enhances cyber security, and risk and capital management. Innovation is at the heart of all Global Technology does.

Enterprise Cloud Platforms Team:

Enterprise Cloud Platforms team in the CTO organization offers Private and Public Cloud platforms for Bank of America's developers to drive faster time-to-market, innovation with private and public cloud capabilities, and reduce complexity with bult-in integrations. We believe in high quality engineering culture to engineer our platforms with customer and platform mindset, design for large enterprise scale and resilience, and accelerate market innovation into the technical platforms we deliver.

As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.

We are seeking an experienced Senior Cloud Site Reliability Engineer (SRE) to support and administration of our Hybrid Cloud Container (OpenShift /AKS) platform.

Our Cloud Service Reliability Engineers (cSREs) ensure that our Cloud services meet the reliability and uptime requirements of our demanding enterprise customers. This is achieved with, the best engineering practices and resilient design and through a well-defined and effective global on-call rotation that runs 24x7.

The role provides opportunity to work with wide range of technologies and unique perspective on how various services (on-prem/off-prem) interact with each other. You will work with colleagues that are as smart, hardworking, and driven as you. You will get an opportunity to work in a team that keeps growing, innovating, and giving you room to be proactive and creative.

Are you ready for the next step in your career? Then we'd love to hear from you

Position Summary:

Responsible for reliability and support of Container PaaS Platform on-prem/off-prem (Azure /AWS /Google)
Monitor and troubleshoot Container PaaS platform (Openshift) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc.
Perform deep dives into systemic and latent reliability issues, Incident management, problem management
Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues.
Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
Identify and drive opportunities to improve automation for the PaaS services; scope and create automation for deployment, management, and visibility of our services.
Evaluating and automating the scaling and capacity requirements within PaaS environments
Partner with risk, and compliance teams to bring visibility and implement right controls and policies in the PaaS Platform
Ensure resiliency during implementation and identify/fix resiliency problems by collaborating with engineering teams
Be a key stakeholder in the design of cloud services and work with Architecture, engineering, product teams
Participate in 24x7 on-call coverage follow the sun model

Required Skills:

BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
Minimum 8+ years of hands-on experience supporting Kubernetes /Openshift / Container PaaS platform
Experience with Python, Ansible and shell scripting
Kubernetes /Openshift /Terraform certifications are a plus
Strong experience in major services related to Compute, Storage, Network and Security
Experience with monitoring tools like Prometheus and Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics
Strong understanding and background of working with a complex Active Directory and IAM controls
Advanced knowledge of DNS, DHCP, Kerberos and Windows Authentication
Experience with CI/CD tools git /Jenkins, GitOps model
Excellent understanding of Linux /Windows operating systems administration
Systematic problem-solving approach, sense of ownership and drive
Ability to juggle competing priorities and adapt to changes in project scope.
Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.

Desired Job Skills:

Experience in Openshift, managed Kubernetes services such as AKS, EKS, or GKE
Experience in Terraform, ArgoCD, Tekton, and K-native technologies
Experience in agile deployment methodologies (GitOps)
Knowledge of various container runtimes
Familiarity with the operator deployment pattern.
Experience working in a highly available multi-datacenter environment
Experience working with monitoring tools such as Prometheus, Splunk, Dynatrace, Sysdig, or similar tools.
Understanding of cost management, inventory management, FinOps model

Shift:
1st shift (United States of America)

Hours Per Week:
40

Cloud Senior Site Reliability Engineer

12 hours ago

Jersey City, United States Hispanic Technology Executive Council Full time

At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day. One of the keys to driving Responsible Growth is being a great place to work for our teammates...
Site Reliability Engineer

3 weeks ago

Jersey City, New Jersey, United States Devexperts Full time

Company DescriptionDevexperts has been working for nearly two decades consulting and developing for the financial industry. We solve complex technological challenges facing the most well-respected financial institutions worldwide.By becoming a part of Devexperts, you'll become a part of a company that fosters self-improvement and actively seeks...
Site Reliability Engineer

1 week ago

Jersey City, United States DevExperts Full time

Devexperts has been working for nearly two decades consulting and developing for the financial industry. We solve complex technological challenges facing the most well-respected financial institutions worldwide. By becoming a part of Devexperts, youll become a part of a company that fosters self-improvement and actively seeks out-of-the-box ideas. Our teams...
Site Reliability Engineer

3 weeks ago

Jersey City, United States Devexperts Full time

Devexperts has been working for nearly two decades consulting and developing for the financial industry. We solve complex technological challenges facing the most well-respected financial institutions worldwide.By becoming a part of Devexperts, you’ll become a part of a company that fosters self-improvement and actively seeks out-of-the-box ideas. Our...
Site Reliability Engineer

3 weeks ago

Jersey City, United States Devexperts Full time

Devexperts has been working for nearly two decades consulting and developing for the financial industry. We solve complex technological challenges facing the most well-respected financial institutions worldwide.By becoming a part of Devexperts, you’ll become a part of a company that fosters self-improvement and actively seeks out-of-the-box ideas. Our...
Site Reliability Engineer

2 weeks ago

Jersey City, United States DevExperts Full time

Devexperts has been working for nearly two decades consulting and developing for the financial industry. We solve complex technological challenges facing the most well-respected financial institutions worldwide. Interested in learning more about this job Scroll down and find out what skills, experience and educational qualifications are needed. By becoming...
Site Reliability Engineer

3 hours ago

Arizona City, United States Openlane Full time

Job Description: Site Reliability Engineer (f.k.a. Platform Engineer) for CarsArrive Network, Inc. located in Mesa, AZ. Provide daily, hands-on assistance to maintain and advance the build process to ensure reliability and optimum integration with Continuous Integration/Continuous Delivery (CI/CD) and Release Management. Work with the development,...
Aumni - Site Reliability Engineer III

2 weeks ago

Jersey City, New Jersey, United States tapwage Full time

There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Digital Private Markets /Aumni (A JP Morgan Chase Company), you will solve complex...
Site Reliability Engineer

4 days ago

Jersey City, United States BCforward Full time

Job Title: Site Reliability Engineer (AWS) (SRE) Type: W2 (Strictly No C2C and no sponsorship available) Location: Jersey City or Plano or Delaware (Hybrid) Duration: 9 Months Contract to hire Hybrid Model: 3 Days onsite 2 days remote a. Skillset AWS, Big Data, Spark, Python, Shell / Perl Scripting, Control-M, Autosys. Grafana, AppDynamics, APICA b....
Site Reliability Engineer

3 weeks ago

Jersey City, United States Pinnacle Group, Inc. Full time

W2 only - Preferred Citizen or Green Card Holder Contract to Hire Must Have: AWS Certification7-8 years of experience and 2 years of AWS expTools: Grafana, DataDogDatabase: MySQL or Oracle-Unix, Linux, Shell Scripting, LAN, NFS-Python, Go Lang, Terraform, Jenkins -Docker, Kubernetes Site Reliability Engineer (AWS) (SRE)Roles and Responsibilities:• Design,...
Site Reliability Engineer

3 weeks ago

Jersey City, United States Pinnacle Group, Inc. Full time

W2 only - Preferred Citizen or Green Card Holder Contract to Hire Must Have: AWS Certification7-8 years of experience and 2 years of AWS expTools: Grafana, DataDogDatabase: MySQL or Oracle-Unix, Linux, Shell Scripting, LAN, NFS-Python, Go Lang, Terraform, Jenkins -Docker, Kubernetes Site Reliability Engineer (AWS) (SRE)Roles and Responsibilities:• Design,...
Site Reliability Engineer

3 weeks ago

Jersey City, United States Pinnacle Group, Inc. Full time

W2 only - Preferred Citizen or Green Card Holder Contract to Hire Must Have: AWS Certification7-8 years of experience and 2 years of AWS expTools: Grafana, DataDogDatabase: MySQL or Oracle-Unix, Linux, Shell Scripting, LAN, NFS-Python, Go Lang, Terraform, Jenkins -Docker, Kubernetes Site Reliability Engineer (AWS) (SRE)Roles and Responsibilities:• Design,...
Senior Reliability Engineer

3 weeks ago

Jersey City, United States Ben Aris Full time

Job Summary: The Senior Reliability Engineer will identify and resolve mechanical issues, improving quality, capacity, and reliability. Collaborating with onsite technical teams, corporate networks, and vendors, you'll strive to improve plant reliability and performance while applying technical knowledge to analyze causes of long-term reliability issues,...
Site Reliability Engineer

4 weeks ago

Redwood City, California, United States C3 Full time

We are looking for a Site Reliability Engineer to join our team at our HQ in Redwood City, CA.Responsibilities:Maximize system uptime and availability, ensuring functional and performance SLAs.Establish end-to-end monitoring and alerting on all critical aspects.Solve complex problems for critical services and build automation to prevent problem...
Senior Reliability Engineer

2 weeks ago

Jersey City, United States Ben Aris LLC Full time

Job DescriptionJob DescriptionJob Summary: The Senior Reliability Engineer will identify and resolve mechanical issues, improving quality, capacity, and reliability. Collaborating with onsite technical teams, corporate networks, and vendors, you'll strive to improve plant reliability and performance while applying technical knowledge to analyze causes of...
Senior Site Reliability Engineer

3 weeks ago

Redwood City, United States Attain Full time

About Attain Built for consumers and companies, alike. In a world driven by data, we believe consumers and businesses can coexist. Our founders had a vision to empower consumers to leverage their greatest asset-their data-in exchange for modern financial services. Built with this vision in mind, our platform allows consumers to access savings tools, earned...
Senior Site Reliability Engineer

2 weeks ago

Kansas City, United States Gorilla Logic Full time

Gorilla Logic Overview Gorilla Logic provides nearshore Agile teams to Fortune 500 and SMB companies, bringing unparalleled expertise in the delivery of full-stack web, mobile, and enterprise applications. Our highly collaborative Agile Gorillas are uniquely qualified to implement complex software initiatives. With offices in the United States, Costa Rica,...
Site Reliability Engineer

3 days ago

Jersey City, United States Veterans Sourcing Group LLC Full time

Site Reliability Engineer (AWS) (SRE) Jersey City, NJ- onsite 3 days/ week 12 month minimum contract w/ possible full time conversion Roles And Responsibilities Design, code, test, and deliver software to automate manual operational work Troubleshoot priority incidents, facilitate blameless post-mortems, and ensure permanent closure of incidents Engage with...
Lead Site Reliability Engineer

3 weeks ago

Oklahoma City, United States BJ's Wholesale Club Full time

Lead Site Reliability Engineer page is loaded Lead Site Reliability Engineer Apply locations BJ's Club Support Center Marlborough, MA #5997 time type Full time posted on Posted 2 Days Ago job requisition id R147855 Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight...
SRE / Site Reliability Engineer// W2 Only

2 weeks ago

Arizona City, United States Brothers Consulting Full time

Key Skills: • Experience with one or more Cloud Platforms (Azure, GCP) • Experience with Container technologies: Kubernetes, Docker, PKS, Azure Kubernetes Service (AKS) • 5+ years of experience in Site Reliability engineering • Experience setting up monitoring in applications and database. • Experience in ServiceNow, Jira,...

Americas

Europe

Asia / Oceania

Africa

Cloud Senior Site Reliability Engineer