Software Engineer, Research Infrastructure

3 weeks ago


San Francisco, United States Anthropic Limited Full time

Anthropic is seeking an experienced engineer for our Research Infrastructure team. You'll lead initiatives supporting some of the largest, most sophisticated clusters in industry used to train, research, and ultimately serve AI models. Your work will be crucial in ensuring Anthropic is able to continue reliably and safely training frontier models The Research Infrastructure team addresses the problem of developing and scaling systems that enable researchers to iterate quickly and also scale key systems/ components used by researchers during the development phase to work at production scale as our model footprint grows. Responsibilities

Lead build out of industry-leading AI clusters (thousands to hundreds of thousands of machines), partnering closely with cloud service providers on cluster build out and required features

Consult with different stakeholders to deeply understand infrastructure and compute needs, identifying potential solutions to support frontier research and product development

Set technical strategy and oversee development of high scale, reliable infrastructure systems

Mentor top technical talent

Design processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice

You may be a good fit if you

Have 8+ years of relevant industry experience and 3+ years leading large scale, complex projects or teams as an engineer or tech lead

Are obsessed with distributed systems at scale, and have a good handle on ML foundations and/or ML systems

Have a passion for supporting internal partners like research to understand their needs

Have excellent communication skills to build consensus with stakeholders, both internally and externally

Possess deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP

Functional knowledge of python/ rust

Strong candidates may also have experience with

Have security and privacy best practice expertise

Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL

Low level systems experience, for example linux kernel tuning and eBPF

Technical expertise: Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems

Representative projects:

Model lifecycle management

Streamlining model deployments across various supported environments

Model & weights caching

Weight management & access controls

Sandboxing/ exec environment for generated code

Deadline to apply: None. Applications will be reviewed on a rolling basis.

#J-18808-Ljbffr



  • San Francisco, United States Lionheart Ventures Company Defunct Full time

    Principal Software Engineer, Infrastructure Remote-Friendly (Travel-Required), San Francisco, CA, Seattle, WA, New York City, NY About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed...


  • San Francisco, California, United States Benchling Full time

    Biotechnology is rewriting life as we know it, from the medicines we take, to the crops we grow, the materials we wear, and the household goods that we rely on every day. But moving at the new speed of science requires better technology.Benchling's mission is to unlock the power of biotechnology. The world's most innovative biotech companies use Benchling's...


  • San Francisco, California, United States Benchling Full time

    Biotechnology is rewriting life as we know it, from the medicines we take, to the crops we grow, the materials we wear, and the household goods that we rely on every day. But moving at the new speed of science requires better technology.Benchling's mission is to unlock the power of biotechnology. The world's most innovative biotech companies use Benchling's...


  • San Francisco, United States OpenAI Full time

    About the Team The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consumers and businesses. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth. ...


  • San Francisco, United States OpenAI Full time

    About the Team The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consumers and businesses. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth. ...

  • Software Engineer

    1 week ago


    San Francisco, United States Cloudflare Inc Full time

    Available Locations: Remote - US, Eastern or Central-based timezone About the Role An engineering role at Cloudflare provides an opportunity to address some big challenges, at scale. We believe that with our talented team, we can solve some of the biggest security, reliability and performance problems facing the Internet. Just how big? We have in excess of...


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Francisco, United States OpenAI Full time

    About the Team The Applied Engineering team works across research, engineering, product, and design to bring OpenAI's technology to consumers and businesses. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth. About...


  • San Francisco, California, United States Wispr AI Full time

    Wispr is building a more natural way to interact with technology with neural interfaces. We have an elite team of engineers, product designers, and research scientists building magic.About Wispr: We've raised $25M from top-tier VCs like NEA and 8VC. Our angels and advisors include Chester Chipperfield (product lead for the first Apple Watch), Ben Jones (COO,...


  • San Francisco, California, United States Wispr AI Full time

    Wispr is building a more natural way to interact with technology with neural interfaces. We have an elite team of engineers, product designers, and research scientists building magic.About Wispr: We've raised $25M from top-tier VCs like NEA and 8VC. Our angels and advisors include Chester Chipperfield (product lead for the first Apple Watch), Ben Jones (COO,...


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Francisco, United States Fathom Full time

    Fathom is on a mission to use AI to understand and structure the world's medical data, starting by making sense of the terabytes of clinician notes contained within the electronic health records of the world's largest health systems. Our deep learning engine automates the translation of patient records into the billing codes used for healthcare provider...


  • San Francisco, United States Orb Full time

    Mission Orb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage-whether that's through seats, consumption, feature limits, or usage-based tiers. Orb brings that opportunity to every software company. We are reimagining...


  • San Francisco, United States Orb Full time

    Mission Orb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage-whether that's through seats, consumption, feature limits, or usage-based tiers. Orb brings that opportunity to every software company. We are reimagining...


  • San Francisco, United States Orb Full time

    Mission Orb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage-whether that's through seats, consumption, feature limits, or usage-based tiers. Orb brings that opportunity to every software company. We are reimagining...


  • San Francisco, CA, United States Infra Full time

    [Full Time] Software Engineer, Infrastructure at Infra (Canada) | BEAMSTART Jobs Software Engineer, Infrastructure Infra Canada Date Posted 26 Jun, 2022 Work Location San Francisco, Canada Salary Offered Not Specified Job Type Full Time Experience Required No experience required Remote Work Yes Stock Options No Vacancies 1...


  • San Francisco, United States Joinslash Full time

    About Slash Slash is the premier banking platform for small businesses. Our all in one virtual card, bill pay, accounting, and invoicing platform helps entrepreneurs stay on top of their finances, allowing them to spend more time doing what they love. Slash powers hundreds of millions of dollars a year in purchases. Our investors include Y Combinator,...

  • Software Engineer

    1 month ago


    San Francisco, United States Plaid Full time

    [Full Time] Software Engineer - Infrastructure (SRE) at Plaid (United States) | BEAMSTART Jobs Software Engineer - Infrastructure (SRE) Plaid United States Date Posted 06 Jul, 2022 Work Location San Francisco, United States Salary Offered Not Specified Job Type Full Time Experience Required 3+ years Remote Work No Stock Options Yes Vacancies 1...

  • Software Engineer

    1 week ago


    San Francisco, United States Twitter Full time

    Software Engineer - Infrastructure Foundation page is loaded Software Engineer - Infrastructure Foundation Apply locations San Francisco, CA San Jose, CA Seattle, WA time type Full time posted on Posted Yesterday job requisition id R100245 Are you prepared to join the X team and help build the ultimate real-time information-sharing app, revolutionizing how...