Software Engineering Manager, AI Networking

1 week ago


Menlo Park, California, United States META Full time
Job Summary:

In this role, you will be a key member of the Network AI Software team, part of the larger DC networking organization at Meta. The team is responsible for developing and owning the software stack around collective communication libraries.

The team's primary goal is to enable Meta-wide ML products and innovations to leverage our large-scale training and inference fleet through an observable, reliable, and high-performance distributed AI communication stack.

Currently, the team is focused on building customized features, software benchmarks, performance tuners, and software stacks around PyTorch to improve the full-stack distributed ML reliability and performance.

We are seeking a leader to work on the space of GenAI/LLM scaling reliability and performance.

Key Responsibilities:

  • Help define the technical roadmap for the team and drive execution of associated tasks.
  • Support the team in resolving dependencies and collaborate effectively with other groups, such as Hardware, Infrastructure, and Operations.
  • Interact with external partners as needed to resolve dependencies associated with objectives.
  • Guide and help team members develop appropriate skillsets to grow in their careers and address underperformance.
  • Communicate cross-functionally and drive engineering efforts.

Requirements:

  • BS or MS in Computer Science or a related technical discipline or equivalent experience.
  • 2+ years of experience managing a networking-related Software Engineering Team.
  • Working knowledge of network transport stack, such as RoCE (RDMA).
  • Experience with software development for Distributed and Embedded systems.
  • Experience recruiting and managing Software Engineers.

Preferred Qualifications:

  • Experience with NCCL and distributed GPU reliability/performance improvement on RoCE/Infiniband.
  • Experience working with DL frameworks like PyTorch, Caffe2, or TensorFlow.
  • Knowledge of ML, deep learning, and LLM.

Compensation:

$177,000/year to $251,000/year + bonus + equity + benefits.

Industry:

Internet.

Equal Opportunity:

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based on race, religion, color, national origin, sex, sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

We also consider qualified applicants with criminal histories, consistent with applicable federal, state, and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at Meta Accessibility.



  • Menlo Park, California, United States META Full time

    Job Summary:Meta's AI Training and Inference Infrastructure is rapidly expanding to support the increasing use of AI. This growth presents a significant scaling challenge that our engineers must address daily. We need to design and evolve our network infrastructure to connect numerous GPUs together efficiently.To improve performance, we continuously look for...


  • Menlo Park, California, United States META Full time

    Job SummaryThe Meta AI Compiler Software team is seeking a Software Engineering Manager to lead the development and optimization of compiler toolchains for Meta's production DL/ML workloads on the MTIA AI accelerator hardware. The ideal candidate will have experience with compiler architecture, development, and management, as well as a strong understanding...


  • Menlo Park, California, United States META Full time

    Job SummaryMeta's AI Training and Inference Infrastructure is growing exponentially to support ever-increasing use cases of AI. This results in a dramatic scaling challenge that our engineers have to deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like GPUs together.Key...


  • Menlo Park, California, United States META Full time

    Job Summary:Meta is seeking a highly skilled AI/HPC Systems Performance Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, deploying, and operating high-performance networks to support our rapidly growing AI workloads.This is an exciting opportunity to work on cutting-edge technologies and contribute...


  • Menlo Park, California, United States Diffuse Bio Full time

    Key Responsibilities:Design and develop software and APIs to enable internal and external access to our AI systems.Build tools to automate and maintain computing clusters and data parsing pipelines.Collaborate with our team of researchers to develop cutting-edge AI solutions.Requirements:Bachelor's or Master's degree in Computer Science or a related...


  • Menlo Park, California, United States META Full time

    Job SummaryThe GenAI Safety alignment team at Meta is seeking a strong leader to mitigate safety concerns of GenAI models and accelerate the world's AI development.Key ResponsibilitiesManage a team of AI engineers and scientists to develop and build new safety alignment methods for Generative AI models.Communicate and collaborate with cross-functional...

  • Software Engineer

    2 weeks ago


    Menlo Park, California, United States Meta Full time

    Meta AI Software EngineerWe are seeking a highly skilled AI Software Engineer to join our Research & Development teams at Meta. As a key member of our team, you will be responsible for developing and applying AI and machine learning techniques to build intelligent language systems that improve our products and experiences.ResponsibilitiesApply relevant AI...


  • Menlo Park, California, United States META Full time

    Research Scientist Manager, Generative AI ExpertMeta is seeking a strong technical leader to join our team and work on the next generation of Large Language Models (LLMs). As a technical leader, you will play a critical role in building Meta AI, helping people everywhere get stuff done better and faster.Grow a team of domain experts within Language...


  • Menlo Park, California, United States Brio Digital Full time

    Job Title: Senior Lead Software EngineerAbout the Role:We are seeking a highly skilled Senior Lead Software Engineer to join our team at Brio Digital. As a Senior Lead Software Engineer, you will be responsible for owning the architecture, development, and deployment of high availability systems that empower AI Agents.Key Responsibilities:Design and develop...


  • Menlo Park, California, United States Robinhood Full time

    About the RoleWe are seeking a highly skilled Staff Software Engineer to join our Customer Care team, where you will play a pivotal role in shaping the future of our customer experience through cutting-edge generative AI technologies.As a Staff Software Engineer, you will partner closely with our AI/ML teams and other product teams to create AI-powered...


  • Menlo Park, California, United States OSI Engineering Full time

    Job Overview:We are seeking an experienced Staff/Principal Engineer to lead the development of AI capabilities. As the technical lead, you will focus on architecting and building high-quality front-end solutions while collaborating closely with platform engineers working on the AI infrastructure and senior product managers to create innovative customer...

  • Software Engineer

    3 weeks ago


    Menlo Park, California, United States Meta Full time

    Job Title: Software Engineer - Distributed ML TrainingMeta is seeking a highly skilled Software Engineer to join our Network.AI Software team. As a member of this team, you will be responsible for developing and owning the software stack around NCCL (NVIDIA Collective Communications Library), which enables multi-GPU and multi-node data communication through...


  • Menlo Park, California, United States January, Inc. Full time

    January AI:We're seeking a highly skilled iOS Engineer to join our team at January AI, a precision health company that combines continuous glucose monitoring with heart rate, sleep, activity, and food tracking to enable individuals to see the impact of their diet and exercise on their body in real-time.The ideal candidate will have a strong understanding of...


  • Menlo Park, California, United States OSI Engineering Full time

    Join Our Team as a Software EngineerWe are seeking a talented Software Engineer to join our front-end engineering team at OSI Engineering in Menlo Park, CA. As a key member of our team, you will be responsible for developing high-quality mobile and web applications that will drive our future business.Key Responsibilities:Design and implement high-quality...


  • Menlo Park, California, United States OSI Engineering Full time

    We are seeking a highly skilled Senior Cloud Software Engineer to be a key contributor in developing cloud-based services that will drive the future of OSI Engineering. You will join our small and dynamic Cloud Services team, using the latest technology and tools to build high-quality, cross-platform solutions that delight our customers.Key...


  • Menlo Park, California, United States META Full time

    About the Role:We are seeking a highly skilled Product Manager to lead the development of our next-generation AI infrastructure. The ideal candidate will have a strong background in product management, with a focus on AI and hardware.Key Responsibilities:Establish a shared vision and strategy for a portfolio of products that enable efficient and reliable...

  • Software Engineer

    2 weeks ago


    Menlo Park, California, United States Character Technologies Full time

    About the RoleThe Core Product team at Character Technologies is responsible for delivering exceptional user experiences on both web and mobile platforms. Our team charter encompasses the chat experience, from Character Calls to the home page, search & recommendations, and other features used by millions of users daily.We're building the future of open-ended...


  • Menlo Park, California, United States META Full time

    Job Title: Technical Program Manager - AI ResearchMeta is seeking a Technical Program Manager (TPM) to support the Artificial Intelligence Research mission to advance the state-of-the-art of AI. As a TPM, your job will be to drive research breakthroughs and/or accelerate research into production.Responsibilities:Build strong and aligned program teams to...


  • Menlo Park, California, United States Character Technologies Full time

    About the RoleThe Core Product team at Character Technologies is responsible for delivering the core product experience on both the web and mobile platforms. Our team charter spans the chat experience from Character Calls, the home page, search & recommendations, and other features used by millions of users every day.We are building the future of open-ended...


  • Menlo Park, California, United States OSI Engineering Full time

    Cloud Services DeveloperWe're seeking an experienced software engineer to contribute to the development of cloud-based services that drive business growth. As a key member of our Cloud Services team, you'll work with the latest technology and tools to build high-quality, cross-platform solutions that delight our customers.Responsibilities:Technical expertise...