Resiliency and Site Reliability Expert

Found in: Talent US C2 - 2 weeks ago


Malvern, United States Vanguard Full time

At Vanguard, we pride ourselves on delivering an exceptional client experience to all investors; at the core of this experience are systems that reside in a technically complex and constantly evolving resiliency landscape. Passionate, technically skilled engineers are at the center of our resiliency operations, and we are looking to grow our team.

As a Lead Resiliency and Reliability Engineer at Vanguard, you will play a critical role in solving impactful operational problems. You will think creatively to find opportunities to improve system performance and efficiency, scalability, fault tolerance, and self-healing capabilities. You’ll apply Chaos Engineering principles to challenge our systems and discover hidden weaknesses, all while understanding the big picture of how our systems work together to create the ultimate client experience.

Core Responsibilities:

  • Instruments, enhances, and advocates for system observability.  Identifies and develops solutions to bridge system observability gaps. 
  • Collaborates with internal teams to evaluate the health, stability, and reliability of systems/platforms. Looks for opportunity to improve system availability, performance efficiency and resiliency.
  • Develops and communicates new standards and newly available tools and frameworks across subdivisions. Enforces reliability standards. Designs and develops new automated solutions for reliability.
  • Make contributions to centrally managed (IT-wide) inner source libraries for reliability, such as the OpenTelemetry wrapper libraries.
  • Provides technical leadership, consultancy, and coaching on designing and implementing both traditional and serverless architectures in AWS with an emphasis on repeatability, scaling options, resilience, reliability, telemetry, networking, etc., including design patterns for resilient systems
  • Leads failure modes analysis spanning product families when new features and architecture patterns are introduced.   Leads cross-product or cross-subdivision chaos experimentation.  Facilitates post-incident reviews for any high severity client impacting events local to the product family.
  • Designs, reviews, and coaches others on performance tests using appropriate components (e.g., requests per minute, # of threads, the construction of a request with headers and cookies)
  • Consults, reviews, coaches, and influences architectural decisions, including non-functional aspects, proposing potential technical solutions/enhancements, and explaining convincingly which is better and why.
  • Contributes to or leads Reliability Engineering and Resilience communities of practice. Remains informed about site reliability engineering activities happening within the subdivision.
  • Provides technical leadership, guidance, consulting, training, and governance on SRE to one or more product families in a subdivision.  Works with product owners and teams to set subdivision goals for higher availability and SRE impact, and tracks progress toward achieving them.
  • Identifies opportunities to automate away toil and develops solutions, monitors error budget exhaustion rates, configures auto scaling thresholds for the product, and incorporates resilience patterns, such as circuit breakers, into the application code. Develops complex deployment and/or routing strategies for high availability.
  • Maintains and looks for opportunities to improve centralized incident response playbook for the subdivision to document standards for managing communication and escalation during an incident.  Oversees blameless post-incident reviews for high severity incidents involving more multiple product families.
  • Onboard and train new SRE Practitioners and Leads within the subdivision
  • •Maintain + enforce subdivisional reliability engineering standards
  • Communicate new standards and newly available tools and frameworks back to other SREs within their subdivisions, for example - news regarding observability tools, libraries for resilience patterns, and emerging cloud platforms.
  • Make contributions to centrally managed (IT-wide) inner source libraries for reliability, such as the OpenTelemetry wrapper libraries.
  • Aggregation of quantifiable data about availability to report back to senior leadership
  • Coordination of cross-product and/or cross-subdivision Chaos experimentation
  • Maintenance of any centralized incident response playbooks for the subdivision•
  • Note: These differ significantly from runbooks. Runbooks document the step-by-step process to recover a specific component within a system. Incident Playbooks document standards for managing communication and escalation during an incident, including handoffs to other teams.
  • Facilitation of blameless post-incident reviews for high severity incidents or incidents involving more than one product family
  • Regularly attend any Reliability Engineering and Resilience communities of practice
  • Host Retail subdivision Reliability Engineering community of practice
  • Remain informed about SRE activities happening within the subdivision

What it Takes:

  • Minimum of eight years related work experience, with at least three years of development experience.
  • Undergraduate degree or equivalent combination of training and experience. Graduate degree preferred.
  • Full stack development – JDK8+ preferred with spring boot, Rest APIs, multithreaded, multiprocessing applications, Graphql.  Experience with UI development (familiar with Angular, TypeScript, NodeJS etc.) is a plus.
  • Ability to diagnose and resolve problems in high-throughput applications,
  • Experience with one or more observability frameworks or tools – Experience with OpenTelemetry (java, js, etc.), Cloudwatch, Grafana, Splunk, etc.
  • Exposure to *nix environments including some shell script development and basic command execution.
  • Strong understanding of database principles and working knowledge in distributed storage and infrastructural solutions.
  • Experience with container management and micro-services architectures such as Docker in cloud and on-premises infrastructure.
  • Working knowledge of AWS network foundations, application networking, edge, and network security.
  • Excellent communication, and documentation skills.

Special Factors:

  • Vanguard is not offering visa sponsorship for this position.

Special Factors

Sponsorship

Vanguard is offering visa sponsorship for this position.

About Vanguard

We are Vanguard. Together, we’re changing the way the world invests.

For us, investing doesn’t just end in value. It starts with values. Because when you invest with courage, when you invest with clarity, and when you invest with care, you can get so much more in return. We invest with purpose – and that’s how we’ve become a global market leader. Here, we grow by doing the right thing for the people we serve. And so can you.

We want to make success accessible to everyone. This is our opportunity. Let’s make it count.

Inclusion Statement

Vanguard’s continued commitment to diversity and inclusion is firmly rooted in our culture. Every decision we make to best serve our clients, crew (internally employees are referred to as crew), and communities is guided by one simple statement: “Do the right thing.”

We believe that a critical aspect of doing the right thing requires building diverse, inclusive, and highly effective teams of individuals who are as unique as the clients they serve. We empower our crew to contribute their distinct strengths to achieving Vanguard’s core purpose through our values.

When all crew members feel valued and included, our ability to collaborate and innovate is amplified, and we are united in delivering on Vanguard's core purpose.

Our core purpose: To take a stand for all investors, to treat them fairly, and to give them the best chance for investment success.

How We Work

Vanguard has implemented a hybrid working model for the majority of our crew members, designed to capture the benefits of enhanced flexibility while enabling in-person learning, collaboration, and connection. We believe our mission-driven and highly collaborative culture is a critical enabler to support long-term client outcomes and enrich the employee experience.



  • Malvern, United States Vanguard Full time

    As a part of the Corporate Systems DevSecOps organization, you will pioneer industry-leading technologies and toolsets to enable our developers to innovate in record time. You will have the opportunity to enable the way product teams operate since it is crucial to its success in maturing their processes to follow industry practices within a DevSecOps world....

  • SRE, DevOps, AWS, Site Reliable Engineer

    Found in: Talent US C2 - 2 weeks ago


    Malvern, United States Avance Consulting Full time

    Skill: SRE, DevOps, AWS, Site Reliable EngineerTechnology requirementsMin 8+ yrs hands on experience in Ansible Devops technologies.Strong technical skills in CI CD pipelines.Experience in Agile methodologies.Be a key technical contributor on this team.Partner with immediate team, architects, IT counterparts, leadership, and business partners.Partner...


  • Malvern, United States Avance Consulting Full time

    Job DescriptionJob DescriptionSkill: SRE, DevOps, AWS, Site Reliable EngineerTechnology requirementsMin 8+ yrs hands on experience in Ansible Devops technologies.Strong technical skills in CI CD pipelines.Experience in Agile methodologies.Be a key technical contributor on this team.Partner with immediate team, architects, IT counterparts, leadership, and...


  • Malvern, United States Vanguard Full time

    Resiliency & Reliability Engineering team is excited to expand our Cloud Engineering team as we are growing in the solution offerings, tooling and services to support enterprise SI teams in their Resiliency Journey. Building Resiliency in the shift left culture is very complex, and we take it very seriously to enable SI teams with the right toolsets and...

  • Sr Filenet Engineer

    6 days ago


    Malvern, United States CareerBuilder Full time

    Role: Site Reliability Engineer Location: Malvern, PA /Onsite Hire type: Contract/ Full time 3 positions Description By promoting new ways of building software, you'll optimizing products and keep our client experience improving at pace. Your work as Reliability Engineer will not only give you true ownership of your products as you capitalize on a Lean Agile...

  • Azure IAM Architect in Malvern, PA

    Found in: Appcast Linkedin GBL C2 - 2 weeks ago


    Malvern, United States NYTP Full time

    Data Solution Design and Implementation: • Design, implement, and manage data solutions using Azure services. • Create and maintain data pipelines to facilitate seamless data flow. • Optimize data storage solutions for scalability, performance, and cost-effectiveness. • Implement data processing workflows to transform and analyze data. Data...

  • Cloud Architect

    7 days ago


    Malvern, United States New York Technology Partners Full time

    Looking for an Azure IAM Architect and the work location is in Malvern, PA. They want someone who can work onsite from Day 1 at-least 3 days a week in office.We are seeking an Azure IAM Architect to play a pivotal role in the design and implementation of a new Azure tenant. The new tenant will host VDIs that are connected to ALZ on AWS org which is already...

  • Cloud Architect

    Found in: Appcast US C2 - 7 days ago


    Malvern, United States New York Technology Partners Full time

    Looking for an Azure IAM Architect and the work location is in Malvern, PA. They want someone who can work onsite from Day 1 at-least 3 days a week in office.We are seeking an Azure IAM Architect to play a pivotal role in the design and implementation of a new Azure tenant. The new tenant will host VDIs that are connected to ALZ on AWS org which is already...

  • Cloud Architect

    Found in: Appcast Linkedin GBL C2 - 2 weeks ago


    Malvern, United States New York Technology Partners Full time

    Looking for an Azure IAM Architect and the work location is in Malvern, PA. They want someone who can work onsite from Day 1 at-least 3 days a week in office.We are seeking an Azure IAM Architect to play a pivotal role in the design and implementation of a new Azure tenant. The new tenant will host VDIs that are connected to ALZ on AWS org which is already...


  • Malvern, United States Vanguard Full time

    The Wealth Management business line strives to build on Vanguard's legacy to revolutionize the Wealth Management industry, be a trusted partner to our most profitable clients and their families, and give them the best long term outcomes. The Wealth Management offer is designed to meet these clients' complex wealth management and estate planning needs, so...

  • Data Analyst, Senior Specialist

    Found in: Talent US C2 - 7 days ago


    Malvern, United States Vanguard Full time

    The Brokerage and Investments (B&I) Analytics team is growing again! The B&I Analytics team supports our critical Brokerage and Investments business within Personal Investor. We have an opening for an experienced data analyst to support the Specialty Brokerage Operations (SBO) arm of the business. This analyst will partner closely with the business to...

  • Accounting Manager

    5 days ago


    Malvern, United States Zeus Fire and Security Full time

    About the Role The Accounting Manager will be responsible for supervising the accounting team and overseeing day-to-day financial operations, including general ledger accounting and controls, taxes, and financial statement preparation. The successful candidate will ensure that financial records are accurate, timely, and compliant with accounting standards....


  • Malvern, United States Vanguard Full time

    The Wealth Management business line strives to build on Vanguard's legacy to revolutionize the Wealth Management industry, be a trusted partner to our most profitable clients and their families, and give them the best long term outcomes. The Wealth Management offer is designed to meet these clients' complex wealth management and estate planning needs, so...


  • Malvern, United States Vanguard Full time

    The Wealth Management business line strives to build on Vanguard's legacy to revolutionize the Wealth Management industry, be a trusted partner to our most profitable clients and their families, and give them the best long term outcomes. The Wealth Management offer is designed to meet these clients' complex wealth management and estate planning needs, so...

  • Licensed Outpatient Therapist- Part-Time

    Found in: Resume Library US A2 - 7 days ago


    Malvern, Pennsylvania, United States Devereux Advanced Behavioral Health Full time

    Description:   $4,000 sign on bonus!   Do you enjoy conducting individual and family therapy? Are you passionate about helping individuals become resilient and focused on their own wellness and recovery?   If you answered YES to either of these questions, then consider joining our Devereux Advanced Behavioral Health team! Being a Part Time Licensed...

  • District Manager

    6 days ago


    Malvern, United States CareerBuilder Full time

    Waste Connections is looking for an District Manager to join our Loess Hills Landfill team! Why you need to join us! CULTURE : It's a Great place to work! We work in an environment where empowered, self-directed all-stars know what they do is important. INTEGRITY : Our definition is "saying what you will do and then doing it!" We keep our promises to our...

  • Lubrication Technician

    Found in: Lensa US P 2 C2 - 7 days ago


    Malvern, United States Arauco Full time

    Job Summary Safely provides world-class Lubrication services to Operations and ensures that all equipment is available when needed, to produce products for our customers. To take the lead in converting the mill from a reactive parts changer/ fixer mode to a proactive mode where we anticipate problems and improve the reliability of the plant. Position...


  • Malvern, United States CareerBuilder Full time

    Job Description Job Description Vision: We see a world where each individual understands their God-given mission in life and is doing their best to fulfill it; a world where Catholic leaders are influential voices in society; a world where Jesus' example of loving, servant leadership is modeled in every family, workplace, parish, and community. Purpose:...


  • Malvern, United States Catholic Leadership Institute Full time

    Vision: We see a world where each individual understands their God-given mission in life and is doing their best to fulfill it; a world where Catholic leaders are influential voices in society; a world where Jesus' example of loving, servant leadership is modeled in every family, workplace, parish, and community. Purpose: Catholic Leadership Institute (CLI)...


  • Malvern, United States Vanguard Full time

    The Vanguard advisors’ website is a critical tool in serving the needs of financial advisors who are considering or doing business with Vanguard. The site aims to give advisors the information, tools, and insights to help them better serve their clients, highlighting Vanguard as a trusted partner. The Vanguard Financial Advisor Services Digital team is...