Site Reliability Engineer, VP

14 hours ago


Dallas, Texas, United States The Goldman Sachs Group Full time
About the Role

We are seeking a highly skilled Site Reliability Engineer, VP to join our team at The Goldman Sachs Group. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining scalable and reliable systems that support our business operations.

Responsibilities
  • Own technical operations for systems that manage hundreds of thousands of compute cores
  • Build observability for new deployments to ensure robustness from day one, as well as mature deployments to identify and implement improvements
  • Troubleshoot and resolve issues with block devices, file descriptors, and packet loss
  • Lead real-time outage investigations and present postmortems to senior management
  • Define SLIs and SLOs and partner with development teams to ensure systems are sufficiently well designed and instrumented
  • Partner with our development team throughout development and operations
  • Plan and manage deployments and migrations (including end-of-life programs)
  • Plan and implement robust business continuity and security programs
  • Provide regional coverage for the Procmon platform and participate in on-call support
Requirements
  • 5+ years of relevant professional experience
  • 3+ years of Linux fundamentals and system administration skills
  • 3+ years of networking experience(familiarity with TCP/IP, IP routing, firewalls, secure tunneling protocols)
  • 3+ years experience working with distributed computing systems and Cloud computing environments
  • Excellent problem-solving and automation skills
  • Proficiency in at least one programming language; the team uses a mix of Go, Python and Erlang
  • Able to operate effectively in a mission critical, highly regulated financial services environment
About Goldman Sachs

At Goldman Sachs, we commit our people, capital and ideas to help our clients, shareholders and the communities we serve to grow.

We believe who you are makes you better at what you do. We're committed to fostering and advancing diversity and inclusion in our own workplace and beyond by ensuring every individual within our firm has a number of opportunities to grow professionally and personally, from our training and development opportunities and firmwide networks to benefits, wellness and personal finance offerings and mindfulness programs.

Learn more about our culture, benefits, and careers.



  • Dallas, Texas, United States The Goldman Sachs Group Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer, VP to join our team at The Goldman Sachs Group. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining large-scale distributed systems that support our business operations.Key ResponsibilitiesOwn technical operations for systems...


  • Dallas, Texas, United States The Goldman Sachs Group Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Procmon Platform team at Goldman Sachs. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our systems, which manage hundreds of thousands of compute cores.ResponsibilitiesOwn technical operations for systems that...


  • Dallas, Texas, United States CV Library Full time

    {"title": "Site Reliability Engineer", "description": "Job SummaryWe are seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our applications and services.Key ResponsibilitiesMonitor and analyze system performance to identify areas...


  • Dallas, Texas, United States STIAOS Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at STIAOS Technologies in Dallas, TX. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our ecommerce platform.Key Responsibilities:Collaborate with cross-functional teams to identify...


  • Dallas, Texas, United States Themesoft Inc. Full time

    Site Reliability EngineerAt Themesoft Inc., we're seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Foster a culture of reliability and efficiency by sharing best...


  • Dallas, Texas, United States STIAOS Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at STIAOS Technologies in Dallas, TX. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our software systems.Key Responsibilities:Collaborate with cross-functional teams to identify and...


  • Dallas, Texas, United States Diverse Lynx Full time

    Job DescriptionRole: Site Reliability Engineer/DevOps EngineerLocation: Dallas, TX (Onsite)Duration: Full-timeJob Description: We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our applications...


  • Dallas, Texas, United States Net2Source Inc. Full time

    Job Title: Site Reliability Engineering ManagerNet2Source Inc. is a leading provider of total workforce solutions, recognized for its accelerated growth and commitment to delivering high-quality staffing services. As a Site Reliability Engineering Manager, you will play a critical role in ensuring the reliability and scalability of our systems, collaborating...


  • Dallas, Texas, United States Learfield Full time

    About LearfieldLearfield is a leading media and technology services company in intercollegiate athletics, unlocking the value of college sports for brands and fans through an omnichannel platform with innovative content and commerce solutions for fan engagement.Job SummaryWe are seeking an experienced Senior Site Reliability Engineer to join our team,...


  • Dallas, Texas, United States Goldman Sachs Full time

    About This RoleWe are seeking a highly skilled Site Reliability Engineering Specialist to join our team at Goldman Sachs. As a Site Reliability Engineer, you will play a critical role in ensuring the availability and reliability of our firm's most critical platform services.Key ResponsibilitiesDevelop and implement incident management processes to ensure...


  • Dallas, Texas, United States Signify Health Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our vibrant team at Signify Health. As a Site Reliability Engineer, you will play a critical role in ensuring the stability, scalability, and availability of our products.Key ResponsibilitiesDevelop and Implement Strategies to improve the performance and reliability of our...


  • Dallas, Texas, United States Learfield Full time

    Learfield is seeking a seasoned Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will be responsible for ensuring the reliability, availability, and performance of our live services. Your expertise in Linux container technologies, public and private clouds, and cloud orchestration frameworks will be instrumental in...

  • Software Engineer

    1 week ago


    Dallas, Texas, United States Federal Reserve Bank Full time

    About the RoleThe Federal Reserve Bank of Dallas is seeking a highly motivated and experienced Software Engineer to join our Site Reliability Engineering (SRE) team. As a key member of our team, you will be responsible for designing, developing, and implementing scalable, highly available system architectures to handle increasing loads and user demands.Key...

  • Software Engineer

    5 days ago


    Dallas, Texas, United States Federal Reserve Bank Full time

    About the RoleThe Federal Reserve Bank of Dallas is seeking a highly motivated and experienced Software Engineer to join our Site Reliability Engineering (SRE) team. As a key member of our team, you will be responsible for designing, developing, and implementing scalable, highly available system architectures to handle increasing loads and user demands.Key...


  • Dallas, Texas, United States Signify Health Full time

    About the Role:Signify Health is seeking a highly skilled Site Reliability Engineer II to join our vibrant team. As a Site Reliability Engineer, you will play a critical role in ensuring the stability, scalability, and availability of our products.Your Key Responsibilities:Develop and Implement Strategies: Design and implement strategies to improve the...


  • Dallas, Texas, United States Goldman Sachs Full time

    About Goldman SachsAt Goldman Sachs, we're committed to fostering a culture of innovation and collaboration. Our team is dedicated to helping our clients, shareholders, and communities grow.As a leading global investment banking, securities, and investment management firm, we're headquartered in New York and maintain offices around the world.GS AccelerateGS...


  • Dallas, Texas, United States Diverse Lynx Full time

    Job SummaryDiverse Lynx LLC is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our cloud-based systems.Key ResponsibilitiesSystem Monitoring and Alerting: Develop and maintain monitoring tools and alerting systems to...


  • Dallas, Texas, United States Apple Full time

    Job SummaryApple is seeking a highly skilled Site Reliability Engineering Manager to lead a team responsible for providing a platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish.Key ResponsibilitiesEstablish and maintain SRE practices for a private cloud service to...


  • Dallas, Texas, United States Wise Skulls llc Full time

    Job OverviewPosition: Site Reliability Engineer (Python)Location: Dallas, TX (On-site presence required)Contract Duration: 12 monthsPartnering Company: Wise Skulls LLCClient: ConfidentialKey Responsibilities:Minimum of 5 years of relevant experience in the field.Proficient in Python programming and familiar with frameworks such as Django or Flask.Mandatory...


  • Dallas, Texas, United States Hitachi Full time

    About the RoleWe're seeking a highly skilled Lead Application Support Site Reliability Engineer to join our team at Hitachi Digital Services. As a key member of our Site Reliability Engineering team, you will be responsible for ensuring the availability, reliability, and performance of our services and platforms in a highly transactional 24x7 environment.Key...