Research Computing Engineer/SRE

3 weeks ago


New York, United States PDT Partners Full time

The Research Computing HPC team is a group of experts solving computing problems in the critical path of Research. We work directly with Research and Model Implementation teams and provide them with tools and computing resources to take their ideas from inception to real tradable products. We are looking for an ambitious and operationally minded software engineer to join our team as we mature and scale our cloud HPC platform to the next iteration of our firm-wide Research platform.

Why join us?
PDT Partners has a stellar 30+ year track record and a reputation for excellence. Our goal is to be the best quantitative investment manager in the world-measured by the quality of our products, not their size. PDT's very high employee-retention and mobility speaks for itself. Our people are intellectually extraordinary, and our community is close-knit, down-to-earth, and diverse. Our engineers love to work on challenging and complicated problems, and in return, they have a chance to make a direct impact on our bottom line, without the attitude and bureaucracy of a typical Wall Street firm.

Responsibilities:

We are a small flat team sitting at the cross-section of research, implementation, and platform infrastructure. Our team responsibilities span many areas. Below find a sampling of the types of work you will be expected to work on:

  • Design and implementation of cloud-based HPC systems. Our projects involve equal parts engineering and operations for success in our fast-moving environment. You will be expected to conceive and implement projects small and large.
  • Running our HPC plant day-to-day. Our research environment is up 24/7, and we want to keep it that way. Everybody on the team contributes to the support of our platform, which thankfully is light because of our automation and quality work.
  • Implementing automation. We will always choose to work smart over working hard. You will be responsible for conception and implementation of automation from CI/CD pipelines to production metrics and monitoring of our cloud HPC platform.
  • Obsessive User Focus. All members of the team are expected to partner with researchers and engineers to deliver high-quality cloud HPC systems that are efficient and reliable. This includes leading projects to evolve it as our needs change.
  • Capacity management and benchmark optimization. Our demand for scale and performance is constant and involves challenging optimization problems for workloads critical to research and trading
Below is a list of skills and experiences we think are relevant. Even if you don't think you're a perfect match, we still encourage you to apply because we are committed to developing our people.
  • 5+ years of software engineering and/or systems programming experience
  • 2+ years of experience working with a public cloud such as AWS
  • Mastery of at least one programming language building production systems such as Python or Rust
  • Experience with a production configuration management tool such as Salt/SaltStack
  • Experience with a cloud-based infrastructure-as-code tool such as Terraform
  • Excellent written and verbal communication skills
  • Past experience working with or supporting researchers and/or other developers is a plus
  • Knowledge of NVIDIA GPU management, Slurm, or similar HPC schedulers and resource managers is a plus

Education:
Bachelors or Masters degree in an Engineering or Applied Sciences field from a rigorous academic program or equivalent professional experience.

The salary range for this role is between $195,000 and $225,000. This range is not inclusive of any potential bonus amounts. Factors that may impact the agreed upon salary within the range for a particular candidate include years of experience, level of education obtained, skill set, and other external factors.

PRIVACY STATEMENT: For information on ways PDT may collect, use, and process your personal information, please see PDT's privacy notices.

  • New York, New York, United States Hispanic Technology Executive Council Full time

    About the RoleWe are seeking a highly skilled and experienced Engineering Manager to lead our Site Reliability Engineering (SRE) team. As an SRE Engineering Manager, you will be responsible for improving the productivity of engineers, creating effective reporting mechanisms, and ensuring timely and budget-conscious completion of work within the SRE team.Key...

  • SRE/DevOps Engineer

    3 weeks ago


    New York, United States Open Systems Technologies Full time

    A financial firm is looking for an SRE/DevOps Engineer to join their team in New York, NY.Compensation: $150-200kResponsibilitiesDesign, implement, and manage AWS cloud infrastructure using Terraform and CloudFormationDevelop and maintain CI/CD pipelines using GitLab for seamless code deployment and integrationCollaborate with blockchain engineers to ensure...


  • New York, United States Celonis GmbH Full time

    The Team: Site Reliability Engineering The Role: You will be part of a highly technical, collaborative and creative team, with a focus on Site Reliability & Software Engineering. Responsible for the design, implementation, reliability and management of cloud-based FedRAMP-compliant applications and platforms. Responsible for application incident...


  • New York, United States Motion Recruitment Full time

    New York, NYHybridFull Time$150k - $200kOur client, a fintech company with offices headquartered in New York City, is seeking a SRE/Platform/Devops Engineer to join their team. This full-time role with options to be onsite or remote offers a competitive compensation, generous benefits, and the opportunity to work in an innovative environment. As a...


  • New York, United States Motion Recruitment Full time

    Our client, a fintech company with offices headquartered in New York City, is seeking a SRE/Platform/Devops Engineer to join their team. This full-time role with options to be onsite or remote offers a competitive compensation, generous benefits, and the opportunity to work in an innovative environment. As a SRE/Platform/Devops Engineer, you’ll build out...


  • New York, United States Talented Hires Full time

    Senior Site Reliability Engineer (SRE) Pioneering the Future of Generative AI Location: Remote Compensation: Top-Tier Salary + Generous Equity + Comprehensive Benefits About Us: We are a dynamic and ambitious Series A startup leading the charge in generative AI for language processing. Our vision is to revolutionize how machines understand and generate...

  • SRE/DevOps Engineer

    2 months ago


    New York, United States Motion Recruitment Full time

    One of our clients is in search of an SRE/DevOps Engineer either hybrid or fully remote. This full -time position offers the opportunity to partner with other team members to manage and automate their cloud Infrastructure. For the role, you will bring a strong software development background to ensure all is running smoothly. Required Skills and...


  • New York, United States Open Systems Technologies Full time

    NY Fed - Cloud AWS Engineer - REMOTE - w2 only USCTop Skills for #5558 SRE role1. Heavy hands on AWS, S3, Loadbalancing, ECS, EKS2. Terraform coding to create infrastructure3. Pipeline on GIT4. IaC5. ContainersPluses: Certifications, Grafana, Prometheus, Dynatrace, KafkaNo coderpad test but should be able to code on Java and/or python100% Remote - must work...


  • New York City, United States Motion Recruitment Full time

    New York, NYHybridFull Time$150k - $200kOur client, a fintech company with offices headquartered in New York City, is seeking a SRE/Platform/Devops Engineer to join their team. This full-time role with options to be onsite or remote offers a competitive compensation, generous benefits, and the opportunity to work in an innovative environment. As a...


  • New York, United States The Cypress Group Full time $180 - $250

    Job DescriptionJob DescriptionJob Title: Windows Server Site Reliability Engineer (SRE)Are you a proactive problem-solver with a passion for optimizing Windows Server environments? We are seeking a skilled Windows Server Site Reliability Engineer (SRE) to join our dynamic team! In this role, you'll be at the forefront of ensuring our Windows Server...


  • New York, United States Insight Global Full time

    Insight Global is looking for a Sr Infrastructure Engineer (SRE - Network & Cloud) for a client in New York. This role requires working onsite 3 days a week. Sr. Infrastructure Engineer - Site Reliability Engineering - Network & CloudLocation: New York, NY 10285 - onsite 3 days a weekDuration: PermanentSalary Range: $150K - $190K + bonus/benefitsOverviewWe...

  • SRE for Cloud

    3 months ago


    New York, United States Vodastra Full time

    Job DescriptionJob DescriptionRole:Manage cloud infrastructure, provide resource allocation, system upgrades, user accesscontrol etc.• Perform deep dives on complex system issues ranging from software bugs, hardwarefailures to network issues.• Build tools and automation to improve the operational efficiency.• On call responsibilityMinimum...


  • New York, United States Ledgent Technology Full time

    No Corp-to-Corp, No 3rd party firms . Job Title: Sr Site Reliability Engineer SRE Location: Fully onsite in Irvine, CA Employment Type: Direct-hire Compensation: $150,000 to $180,000 (based on experience) . Partnered with a client who is at the forefront of the future innovation hub of next-generation networking, IoT smart home products, and software...

  • Observability SRE

    1 day ago


    New York, United States Nomura International Full time

    Job Title: Observability SRE Corporate Title: Associate Department: Group Platform Services & Engineering / Observability Location: New York, NY The pay range for this position at commencement of employment is expected to be between $120,000 and $145,000/year* Company Overview Nomura is a global financial services group with an integrated network spanning...

  • Sr. DevOps/SRE

    2 weeks ago


    New York, United States Benchmark IT LLC Full time

    Our direct client, a fast-growing FinTech firm in New York City, is looking for a Senior DevOps/SRE Engineer to develop and maintain the production and development environments for a multi-party application. This role will utilize strong DevOps principles and advanced cloud capabilities to facilitate the infrastructure and automated CI/CD for a distributed...

  • Principal SRE

    7 days ago


    New York, United States SS&C Technologies Full time

    SS&C is a global provider of investment and financial services and software for the financial services and healthcare industries. Named to Fortune 1000 list as top U.S. company based on revenue, SS&C is headquartered in Windsor, Connecticut and has 20,000+ employees in over 90 offices in 35 countries. Some 18,000 financial services and healthcare...

  • Principal SRE

    2 months ago


    New York, United States SS&C Full time

    SS&C is a global provider of investment and financial services and software for the financial services and healthcare industries. Named to Fortune 1000 list as top U.S. company based on revenue, SS&C is headquartered in Windsor, Connecticut and has 20,000+ employees in over 90 offices in 35 countries. Some 18,000 financial services and healthcare...


  • New York, United States tCognition Full time

    Position: SRE / Applications Support EngineerDuration:6 MonthsLocation: Hybrid(New York)Interview mode: OnlineExp required : Minimum 7+ yearResponsibilities:Proficient in application development skills for more than one technology as well as multiple design techniquesWorking proficiency in development toolset to design, develop, test, deploy, maintain and...


  • New York, United States Diverse Lynx Full time

    Role : SRE - Site Reliability Engineer Location: Jacksonville, FL, Cary, NC or New York, NY (Onsite) Duration: Full-time Job Description: Should be having cloud engineering experience and acting as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, and implementing automated solutions to...

  • Sr. DevOps/SRE

    2 weeks ago


    New York, United States Benchmark IT - Technology Talent Full time

    Our direct client, a fast-growing FinTech firm in New York City, is looking for a Senior DevOps/SRE Engineer to develop and maintain the production and development environments for a multi-party application. This role will utilize strong DevOps principles and advanced cloud capabilities to facilitate the infrastructure and automated CI/CD for a distributed...