Principal Site Reliability Engineer

3 weeks ago


Jersey City, New Jersey, United States Fidelity Investments Full time
Job Title: Principal Site Reliability Engineer

The Role:

As a member of the TechOps SRE team at Fidelity Investments, you will work closely with our engineering partners to enable and drive initiatives from design to implementation. Our highly available multi-region Kubernetes environments are best-in-class and central to our enterprise-grade infrastructure strategy. These growing environments currently support numerous mission-critical workloads. In this exciting role, you will have the opportunity to further develop and refine your skills, collaborate across numerous Fidelity teams, and continue to grow in a fun, collaborative, and rapidly changing environment. This is a phenomenal opportunity to have a direct impact on the emerging strategies of our infrastructure and deployments, while at the same time, helping enable the expansion of our business.

The Skills and Expertise You Bring:

  • 5+ years of hands-on experience with AWS in a production environment
  • Experience building and deploying Docker images including Docker Compose
  • Production experience running Kubernetes workloads ideally on AWS EKS
  • Experience managing and maintaining Kubernetes Clusters on AWS EKS
  • Experience with Confluent or Kafka
  • Experience creating and deploying Helm charts & libraries
  • Hands-on experience with Jenkins Core, including authoring and maintaining declarative CI/CD pipelines and libraries
  • Experience with monitoring tools e.g., CloudWatch, Datadog & Splunk Cloud
  • Proficiency with UNIX operating systems and shell scripting
  • Experience with Amazon Web Services (AWS), having managed services and applications in a large AWS cross-account environment using IAM and federated SSO
  • Experience crafting and maintaining logging, monitoring, and alerting capabilities using tools like Datadog and Splunk
  • Ability to communicate at all levels with track record of strong written and verbal communications
  • See problems as opportunities to automate
  • Ability to work independently with minimal direction
  • Drive and champion the overall design of highly available, secure, scalable microservices-based applications in AWS
  • Track record of providing technical leadership to strong teams of Site Reliability Engineers / Cloud Engineers
  • Experience with configuring and deploying resilient infrastructure in multiple regions and multiple availability zones
  • Work multi-functionally with other organizations and collaborate with our risk, product and engineering team leaders
  • Leading the initiative to craft and deploy our applications to the cloud
  • Promoting a DevOps mentality, providing mentorship and establishing development standard methodologies for AWS infrastructure-as-code
  • Championing automation tools to improve software delivery and reduce risk
  • Production experience with infrastructure-as-code (IaC), Terraform preferred
  • Programming experience, e.g., Python preferred
  • Experience with distributed version control systems, Git preferred
  • Experience with Apache or Confluent Kafka a plus
  • Experience with the agile software development lifecycle and Kanban preferred
  • Experience with CDN Providers e.g., Akamai preferred


The Team:

Fidelity Digital AssetsSM, a Fidelity Investments Company, is developing a full-service enterprise-grade platform for storing, trading, and servicing digital assets, such as Bitcoin and Ethereum. Fidelity Digital AssetsSM embraces an entrepreneurial culture and startup mindset while serving as one of the most innovative business units within Fidelity Investments. Our global, diverse team of hundreds of forward-thinking professionals lead with agility and creativity to build solutions that bridge the gap between traditional institutional investors and their exposure to digital assets. The firm's tenure and experience across multiple business lines present our employees with unprecedented access to knowledge, technology, and resources that help our team reshape the future of finance. Within Fidelity Digital Assets, Technical Operations team is central to our initiative of moving to the cloud. The team uses AWS services to secure our network and scale our applications to ensure their up-time and reliability. Team members are hands-on Site Reliability Engineers who promote a DevOps approach, with a focus on infrastructure-as-code, security and automation.

  • Jersey City, New Jersey, United States Fidelity Investments Full time

    The RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our TechOps SRE team. As a key member of our team, you will work closely with our engineering partners to help enable and drive initiatives from design to implementation. Our highly available multi-region Kubernetes (AWS EKS) environments are best-in-class and central to our...


  • Jersey City, New Jersey, United States Fidelity TalentSource LLC Full time

    Job Description:The RoleAs a member of the TechOps SRE team, you will work closely with our engineering partners to help enable and drive initiatives from design to implementation.Our highly available multi-region Kubernetes (AWS EKS) environments are best-in-class and central to our enterprise-grade infrastructure strategy. These growing environments...


  • Jersey City, New Jersey, United States Fidelity TalentSource LLC Full time

    Job Description:As a member of the TechOps SRE team, you will work closely with our engineering partners to enable and drive initiatives from design to implementation.Our highly available multi-region Kubernetes (AWS EKS) environments are best-in-class and central to our enterprise-grade infrastructure strategy. These growing environments currently support...


  • Jersey City, New Jersey, United States The Goldman Sachs Group, Inc Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Goldman Sachs. As a Site Reliability Engineer, you will be responsible for designing, developing, and operating distributed systems that provide observability for our mission-critical applications and platform services.Your...


  • Jersey City, New Jersey, United States The Goldman Sachs Group, Inc Full time

    About the RoleWe are seeking a talented Site Reliability Engineer to join our SRE Platforms team at Goldman Sachs. As a Site Reliability Engineer, you will be responsible for designing, developing, and operating distributed systems that provide observability for our mission-critical applications and platform services.Our team is responsible for designing and...


  • Jersey City, New Jersey, United States Syntricate Technologies Full time

    We are seeking a highly skilled AWS Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud infrastructure, particularly our AWS environment.The ideal candidate will have strong experience with AWS, with a focus on SRE principles...


  • Jersey City, New Jersey, United States Syntricate Technologies Full time

    We are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud infrastructure, particularly on AWS. Your strong AWS experience and 2-3 years of recent experience will be invaluable in this role.The ideal...


  • Jersey City, New Jersey, United States Royal Bank of Canada Full time

    Job SummaryAt Royal Bank of Canada, we're seeking a highly skilled Lead Site Reliability Engineer to join our team. As a key member of our Site Reliability Engineering (SRE) team, you'll be responsible for designing, implementing, and maintaining scalable, reliable, and efficient systems that meet the needs of our customers.Key ResponsibilitiesDesign and...


  • Jersey City, New Jersey, United States Royal Bank of Canada Full time

    Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Royal Bank of Canada. As a key member of our Technology and Operations group, you will be responsible for designing, implementing, and maintaining scalable and reliable systems to support our business applications.Key ResponsibilitiesDesign and implement...


  • Jersey City, New Jersey, United States The Dignify Solutions LLC Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at The Dignify Solutions LLC. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining our cloud-based infrastructure. Your expertise in cloud platforms, automation tools, and security fundamentals will be crucial in...


  • Jersey City, New Jersey, United States The Dignify Solutions LLC Full time

    Job SummaryWe are seeking a highly experienced Site Reliability Engineer Leader to join our team at The Dignify Solutions LLC. The ideal candidate will have a strong background in building and running applications in production with uptime over 99%.Key ResponsibilitiesDesign and implement large-scale Reliability & Observability Programs for complex...


  • Jersey City, New Jersey, United States The Dignify Solutions LLC Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at The Dignify Solutions LLC.Key Responsibilities:Expertise in source code management tools such as SVN, GitHub, or GitLabExperience with binary resource management tools like JFrog Artifactory or HarborStrong background in Linux/Unix administrationExpertise in...


  • Jersey City, New Jersey, United States Royal Bank of Canada Full time

    Job OpportunityThe Royal Bank of Canada is seeking a highly skilled Lead Site Reliability Engineer to join our Technology and Operations team. As a key member of our SRE team, you will be responsible for designing, implementing, and maintaining scalable and reliable systems to support our business applications.Key ResponsibilitiesDesign and implement SRE...


  • Jersey City, New Jersey, United States Royal Bank of Canada Full time

    Job SummaryAt Royal Bank of Canada, we are seeking a skilled Site Reliability Engineer to join our team. This role will be responsible for ensuring the reliability and performance of our large-scale applications. The ideal candidate will have a strong background in software development, operations, and cloud computing.Key Responsibilities• Perform...


  • Jersey City, New Jersey, United States Royal Bank of Canada Full time

    Job SummaryThe Royal Bank of Canada is seeking a highly skilled Lead Site Reliability Engineer to join our Technology and Operations team. As a key member of our SRE team, you will be responsible for designing, implementing, and maintaining scalable and reliable systems to support our business applications.Key ResponsibilitiesDesign and implement monitoring...


  • Jersey City, New Jersey, United States Bank of America Full time

    Job Title: Cloud Senior Site Reliability EngineerAt Bank of America, we are committed to delivering exceptional customer experiences through the power of technology. As a Cloud Senior Site Reliability Engineer, you will play a critical role in designing, building, and maintaining our next-gen AWS platform.Key Responsibilities:Collaborate with...


  • Jersey City, New Jersey, United States Royal Bank of Canada Full time

    Job SummaryThe Application Support SRE will be responsible for the support, development, and implementation of Site Reliability Engineering solutions for all applications within Royal Bank of Canada (RBC), a leading financial institution.Key ResponsibilitiesPerform application production support role including off-hours support.Development of SRE solutions...


  • Jersey City, New Jersey, United States RBC Capital Markets, LLC Full time

    Job SummaryThe Application Support SRE will be responsible for the support, development, and implementation of Site Reliability Engineering solutions for all applications within RBC Capital Markets, LLC.This team will work collaboratively with teams across several lines of business and other Technology and Operations partners as a requirement to succeed in...


  • Jersey City, New Jersey, United States Hispanic Technology Executive Council Full time

    Job Description:At Hispanic Technology Executive Council, we are guided by a common purpose to help make financial lives better through the power of every connection.Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day.One of the keys to driving Responsible Growth is being a great...


  • Jersey City, New Jersey, United States BAE Systems USA Full time

    Job DescriptionAre you a skilled software engineer looking for a challenging role that will allow you to grow and develop your skills? We are seeking a Senior Principal Software Engineer to join our team at BAE Systems USA.This is an exciting opportunity to work on cutting-edge software development projects, collaborating with a diverse team of engineers and...