Senior Site Reliability Engineer

2 months ago


Irving, United States PTR Global Full time

Job Description:


Run our infrastructure with Terraform, Azure PaaS and/or Kubernetes.


  • Make monitoring and alerting notify on symptoms and not on outages.
  • Document so your findings turn into repeatable actions-and then into automation.
  • Improve the deployment process to make it as boring as possible.
  • Independently debug production issues across services and levels of the stack.
  • Proactive communication with issues and propose ideas and solutions within the product team to reduce the workload by automation.
  • Plan, design and execute solutions within product team to reach specific goals agreed within the team.
  • Plan and execute configuration change operations both at the application and the infrastructure level.
  • Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation
  • Complete Root Cause Analysis (RCA) investigations
  • Responsible for gaining a deep understanding of the portfolio and understand the integrations
  • Improving DevOps practices and accelerating delivery and take a lead role in troubleshooting technical issues and recommending changes to improve resiliency
  • Develop strategic technology roadmaps
  • Respond to TechOps incidents and provide support for customer incidents.


All you'll need for success


Minimum Qualifications- Education & Prior Job Experience:


Bachelors degree in Computer Engineering, Computer Science, Electrical Engineering or related field, and 5 years of experience


General knowledge of the following areas with deep knowledge in 2 areas:


  • Implement "Infrastructure as Code" using Terraform in Azure and on-prem infrastructure resources
  • Implement Github , GHA CI/CD and ADO cloud for automation
  • Load balancing the application including Proxies and CDN (automate)
  • Implementing monitoring, observability in AKS and K8S
  • Monitoring and Metrics in Dynatrace, Prometheus, Grafana and integrations with Moogsoft/xMatters
  • Open source Logging infrastructure
  • Able to script Automated performance testing scenarios for APIs and Web front ends and embed in CI/CD pipelines dashboarding/reporting query languages
  • Backend storage management and scaling


Preferred Qualifications- Education & Prior Job Experience:


Masters degree in Computer Engineering, Computer Science, Electrical Engineering or related field, and 3 years of experience


  • Airline Industry experience helpful


Skills, Licenses & Certifications


Proficiency and demonstrated experience in the following technologies:


  • Experienced in technology transformations and migration to one or more Cloud platforms such as AWS, Azure or GCP
  • Hands-on experience with Infrastructure as a Service (IaaS), Platform as a Service (PaaS) tools and platforms, and containers and container orchestration platforms (aka Docker & Kubernetes)
  • Expertise in one or more cloud native relational databases such as MySql, PostgreSql and NoSQL databases such as Cassandra and MongoDB and databases and migration to/from enterprise class databases highly desired
  • Strong technical knowledge and skills that are broad and deep, covering various hardware, software, and technology platforms
  • Nodejs, Typescript, JavaScript
  • Database and persistence frameworks: Mongo, Oracle, Object/Relational Mapping, Query performance tuning
  • Experience with Mongo Schema Design and Mongo Aggregation Framework
  • Develop, implement, and maintain applications and systems that integrate MongoDB
  • Web Services: Graph QL, REST/SOAP (JSON/WSDL/XML)
  • DB Admin/SQL Server
  • Terraform
  • SysAdmin
  • Troubleshooting Network Issues
  • VM Management
  • Dynatrace
  • Ping Federate
  • Airwall
  • Security Vulnerabilities (remediation/compliance)
  • IIS



  • Irving, Texas, United States Wells Fargo Full time

    About this Role:We are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will be responsible for designing, implementing, and maintaining scalable and reliable cloud infrastructure to support our business applications. You will work closely with our development teams to ensure seamless...


  • Irving, Texas, United States Citigroup Inc Full time

    Job DescriptionAs a Site Reliability Engineer at Citigroup Inc., you will play a critical role in ensuring the stability, efficiency, and observability of our Global Wholesale Lending Technology (WLT) environment. You will work closely with technology leads, architects, engineers, and other stakeholders to identify and resolve production incidents, develop...


  • Irving, Texas, United States Citigroup Inc Full time

    Job Description:Citigroup Inc. is seeking a highly skilled Site Reliability Engineer to join our Global Wholesale Lending Technology team. As a key member of our technology organization, you will play a critical role in ensuring the stability, efficiency, and observability of our technology environment.Responsibilities:Partner with technology leads and...


  • Irving, Texas, United States Resource Informatics Group Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Resource Informatics Group. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based systems.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using...


  • Irving, Texas, United States Resource Informatics Group Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Resource Informatics Group. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based systems.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using...


  • Irving, Texas, United States Resource Informatics Group Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Resource Informatics Group. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based systems.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using...


  • Irving, Texas, United States Resource Informatics Group Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Resource Informatics Group. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based services and applications.Key Responsibilities:Develop and maintain comprehensive...


  • Irving, Texas, United States Resource Informatics Group Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Resource Informatics Group. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based services and applications.Key Responsibilities:Develop and maintain comprehensive...


  • Irving, Texas, United States Citigroup Inc Full time

    Job Description:As a Site Reliability Engineer for Citigroup Inc., you will play a critical role in driving the end-to-end deliverables to ensure a stable, efficient, observable, and resilient technology environment for Global Wholesale Lending Technology (WLT).The successful candidate will be responsible for deep-diving into current production incidents,...


  • Irving, Texas, United States PTR Global Full time

    Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at PTR Global.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure solutions.Collaborate with cross-functional teams to identify and resolve performance issues.Develop and maintain monitoring and observability tools using...


  • Irving, Texas, United States Citigroup Inc Full time

    About CitiCiti, a leading global bank, serves over 200 million customers worldwide, operating in more than 160 countries and jurisdictions. As a bank with a strong presence in the global market, Citi provides a wide range of financial products and services to consumers, corporations, governments, and institutions.Job OverviewThe Site Reliability Engineer...


  • Irving, Texas, United States Diverse Lynx Full time

    Job DescriptionJob Title: Site Reliability EngineerCompany: Diverse Lynx LLCJob Type: Full-timeLocation: RemoteAbout Us: Diverse Lynx LLC is an Equal Employment Opportunity employer. We promote and support a diverse workforce across all levels in the company.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team. The successful...


  • Irving, Texas, United States Creospan Full time

    Job Title: Site Reliability EngineerWe are seeking a highly experienced Site Reliability Engineer to join our Application Production Support team at Creospan. The ideal candidate will have a strong background in ensuring the reliability, performance, and scalability of complex systems.Key Responsibilities:Automation and Scripting:Develop and maintain scripts...


  • Irving, Texas, United States Citigroup Inc Full time

    Job OverviewCitigroup Inc. is seeking a highly skilled Senior Vice President, Cloud Security Site Reliability Engineer to join our team. As a key member of our Cloud Security team, you will be responsible for ensuring the security and reliability of our cloud-based systems and applications.Key Responsibilities• Collaborate with cross-functional teams to...


  • Irving, Texas, United States Hispanic Technology Executive Council Full time

    Job OverviewCiti, a leading global bank, is seeking a highly motivated Senior Vice President, Cloud Security Site Reliability Engineer to join its Cloud Security team. As a key member of the team, you will be responsible for working towards the SRE strategy and operating model, driving engineering excellence and secure by design principles.Key...


  • Irving, Texas, United States Citi Full time

    About Citi:Citi is a leading global bank with a presence in over 160 countries and a customer base of approximately 200 million. We provide a wide range of financial products and services to consumers, corporations, governments, and institutions, including consumer banking, corporate and investment banking, securities brokerage, transaction services, and...


  • Irving, Texas, United States Citigroup Inc Full time

    About Citi:Citi is a leading global bank with a presence in over 160 countries and a customer base of approximately 200 million. We provide a wide range of financial products and services to consumers, corporations, governments, and institutions.About the Chief Information Security Office (CISO):The CISO is responsible for ensuring the security of Citi's...


  • Irving, United States Creospan Inc. Full time

    SRE US team Responsibilities: SRE Tools in ProductionCreate dashboard in non-prod for any pre-release issuesApplication monitoring to ensure we can find issues before they happen. In case we have issues how soon we can recover.Recursive calls in the code etcActive monitoring of B360 applications from the below aspectsActivation fallouts  This is revenue...

  • Principal Engineer

    2 weeks ago


    Irving, Texas, United States Wells Fargo Full time

    About this role:We are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our Application Support and SRE team, you will play a critical role in introducing and advancing SRE discipline across multiple applications and vertical lines of business. Your expertise will drive technology transformation and adoption of...


  • Irving, Texas, United States Publicis Groupe Full time

    Job Title: Site Reliability EngineerPublicis Sapient is a digital transformation partner helping established organizations get to their future, digitally enabled state, both in the way they work and the way they serve their customers.We help unlock value through a start-up mindset and modern methods, fusing strategy, consulting and customer experience with...