Site Reliability Engineering Lead
4 days ago
Job Description:
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.
Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to being an inclusive workplace, attracting and developing exceptional talent, supporting our teammates' physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.
Bank of America is committed to an in-office culture with specific requirements for office-based attendance and which allows for an appropriate level of flexibility for our teammates and businesses based on role-specific considerations.
At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us
Job Description:
This job is responsible for partnering with engineering and technology teams to implement measures prescribed by the Site Reliability Engineer teams it leads. Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on call routines are in place for key services, demonstrating technical expertise within domains, and decomposing objectives into work units. Job expectations include advancing efficient solution delivery practices and promoting exceptional design, engineering, and organizational practices.
Responsibilities:
- Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined in the application and system monitoring designs put forward by the Senior Site Reliability Engineer (SRE)
- Develops and maintains reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring SRE resources on reliability practices and established tools/capabilities
- Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them
- Participates regularly in architecture community of practice meetings and communication via other channels
- Identifies vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and defines solutions to reduce manual support effort and/or improve system reliability
- Engages as a subject matter expert in major incident triage efforts and failure scenario modelling and diagnosis with Problem Manager root causes for major incident/problem management investigations
We are seeking a highly skilled and proactive Site Reliability Engineer (SRE) with strong development expertise and deep diagnostic capabilities in production systems. This role is critical to ensuring the reliability, observability, and performance of our enterprise-scale systems, as well as contributing to our internal engineering enablement platforms.
The ideal candidate brings a rare combination of deep systems-level expertise (e.g., heap/thread dump analysis, JVM tuning), production observability acumen (e.g., log and metrics pattern recognition via Splunk), and tooling development capability to proactively support performance, scalability, and root cause analysis.
Key Responsibilities:
- Own and drive production issue triage, including expert-level heap and thread dump analysis, memory profiling, garbage collection investigations, and CPU/thread diagnostics.
- Work closely with performance testing teams to monitor system behavior pre- and post-release, ensuring consistent throughput and low-latency service delivery.
- Develop and maintain monitoring and alerting solutions tailored for performance testing infrastructure and production-like environments.
- Collaborate with developers, QA, and performance engineers to interpret telemetry data, identify failure patterns, and implement self-healing mechanisms.
- Act as a technical enabler for multiple teams, providing tooling, insights, and best practices around observability, reliability engineering, and performance.
- Build internal tools that integrate with existing monitoring platforms (e.g., Splunk, , Dynatrace, DNT) to collate and derive insights from performance testing and production metrics.
- Work alongside in-house development teams to enhance internal platforms that aggregate observability data, provide root cause analysis views, and enable smart test reporting.
- Champion reliability and stability by guiding incident response practices, postmortem reviews, and service-level objectives (SLOs) tracking.
Required Qualifications:
- 7-8 years' experience with hands-on experience in heap dump, thread dump, GC log analysis and JVM internals.
- Proficiency in scripting and application development (e.g., Python, Java, Shell, ) to create diagnostic and observability tools.
- 7-8 years' experience with logging and monitoring platforms (e.g., Splunk, Dynatrace, DNT).
- 7-8 years experience working with distributed systems, microservice architecture, and container orchestration platforms (e.g., Kubernetes, Docker).
- Experience with performance testing environments and tools like JMeter, LoadRunner, Gatling, or custom test harnesses.
- Ability to identify systemic reliability issues and implement resilient patterns (e.g., circuit breakers, graceful degradation, retry logic).
- Exceptional debugging and root cause analysis skills across application, infrastructure, and network layers.
- Demonstrated ability to build observability tooling or integrations that serve multiple internal teams.
- Familiarity with CI/CD practices and infrastructure-as-code (e.g., Terraform, Ansible) is a plus.
Desired Qualifications:
- Self-starter with a service-oriented mindset and a relentless drive to improve system reliability.
- Comfortable working in a cross-functional, high-impact environment supporting dev, ops, and test teams.
- Strong communication and mentoring abilities to influence engineering culture around reliability and performance.
- Experience contributing to or designing internal engineering platforms or toolkits to scale team capabilities.
Skills:
- Automation
- Collaboration
- Influence
- Production Support
- Result Orientation
- Analytical Thinking
- Application Development
- Architecture
- Solution Design
- Stakeholder Management
- Adaptability
- DevOps Practices
- Project Management
- Risk Management
- Solution Delivery Process
Shift:
1st shift (United States of America)
Hours Per Week:
40
-
Site Reliability Engineer
15 hours ago
Charlotte, North Carolina, United States Ryan Consulting Group, LLC Full time $120,000 - $160,000 per year**No Corp to Corp or candidates requiring sponsorship now or in the future will be considered. All candidates must be able to work as a W2 employee for any employer in the US to be considered.**Job Title:Site Reliability Engineer / Sr. DevOpsType:Full-Time, On-SiteLocation:Charlotte, NC (South End)Compensation:$120–$160KOverview:We're seeking a seasoned...
-
Site Reliability Engineer
4 days ago
Charlotte, North Carolina, United States NIMBUSAITECH LLC Full time $65,000 - $130,000 per yearJob Description: Senior Site Reliability Engineer (SRE) – Full Stack ObservabilityLocation : Charlotte , North Carolina -HybridWe are seeking a highly skilled Senior Site Reliability Engineer (SRE) with extensive experience in full-stack observability for data applications across SaaS, hybrid cloud, and on-prem environments. The ideal candidate will be...
-
Senior Site Reliability Engineer
4 days ago
Charlotte, North Carolina, United States Mindlance Full time $200,000 - $250,000 per yearPlease find details for this position below:Client:Banking/Financial IndustryTitle:Senior Site Reliability Engineer / Senior DevOps Engineer, Senior Cloud Engineer, and Senior Platform EngineerLocation:Charlotte, NC - Hybrid RolesDuration:12+ Month (s) Extend or Convert based on performancesRequired Qualifications:Required Qualifications:8+ years of Software...
-
Site Survey Engineer
13 hours ago
Charlotte, North Carolina, United States Brightspeed Full time $60,000 - $100,000 per yearAt Brightspeed, we are reimagining how people live, work, play and connect by providing fast, reliable internet connections and an awesome customer experience in twenty states throughout the Midwest and South.Backed by funds managed by Apollo Global Management,our vision is to accelerate the upgrade of copper to fiber optic technologies, bringing faster and...
-
Civil Engineer
4 days ago
Charlotte, North Carolina, United States American Engineering Full timeAbout the Company:American Engineering is actively seeking Civil Engineers specialized in Land Development in our Charlotte, NC office who are eager to be part of a growing engineering firm.As part of American Engineering, individuals will play a critical role within the design teams, creating solutions for projects within the Land Development and / or Water...
-
Lead Building Engineer
2 days ago
Charlotte, North Carolina, United States Adecco Full timeLead Commercial Engineer – Class A High-Rise Office adCharlotte, North Carolina$73,500 – $75,500Annual Salary + 5% Annual Bonus (5% of base pay)Full-Time | On-Site | Growth OpportunityAbout the RoleAre you a hands-on Commercial LEAD Engineer ready to lead building operations for one of Charlotte's premierClass A high-rise, mixed-use developments? This is...
-
CMT Engineering Technician
20 hours ago
Charlotte, North Carolina, United States NOVA Engineering & Environmental Full time $60,000 - $90,000 per yearNOVA Engineering is hiring a CMT Engineering Technician in Charlotte, NC to perform a variety of field testing, project specific observations, and site assessment duties for our Construction Materials Testing service line.Responsibilities:Performing field materials testing of soil, concrete, masonry, rebar, asphaltPreparing detailed and complete field...
-
System Protection Support Engineer
5 days ago
Charlotte, North Carolina, United States Schweitzer Engineering Laboratories Full time $84,500 - $1,600,000 per yearAs a System Protection Support Engineer, you will be responsible for providing expert technical support and solutions for power system protection schemes. Your role will involve teaching, advising, and supporting customers to ensure the reliability and safety of power grids.At Schweitzer Engineering Laboratories, Inc. (SEL), our mission is to make electric...
-
Director of Hydraulics
2 days ago
Charlotte, North Carolina, United States American Engineering Full time $140,000 - $170,000 per yearAmerican Engineering is actively seeking a Director of Hydraulics & Hydrology to join our growing engineering firm. You will be providing technical expertise and leading designs tasks which will support the Transportation and Land Development departments.Qualifications BS in civil engineering or similar degree is requiredProfessional Engineer license is...
-
Lead Software Engineer
4 days ago
Charlotte, North Carolina, United States Quilt Software Full time $120,000 - $200,000 per yearThe Opportunity:TheLead Software Engineeris responsible for designing, developing, and maintaining scalable and secure server-side applications that power company's products and services. This position leads the engineering team, sets technical standards, ensures system performance, and aligns technology with business goals.Key Responsibilities:Engineering...