Software Engineering Manager, AI Networking

4 weeks ago


Menlo Park, California, United States META Full time
Job Summary:

In this role, you will be a key member of the Network AI Software team, part of the larger DC networking organization at Meta. The team is responsible for developing and owning the software stack around collective communication libraries.

The team's primary goal is to enable Meta-wide ML products and innovations to leverage our large-scale training and inference fleet through an observable, reliable, and high-performance distributed AI communication stack.

Currently, the team is focused on building customized features, software benchmarks, performance tuners, and software stacks around PyTorch to improve the full-stack distributed ML reliability and performance.

We are seeking a leader to work on the space of GenAI/LLM scaling reliability and performance.

Key Responsibilities:

  • Help define the technical roadmap for the team and drive execution of associated tasks.
  • Support the team in resolving dependencies and collaborate effectively with other groups, such as Hardware, Infrastructure, and Operations.
  • Interact with external partners as needed to resolve dependencies associated with objectives.
  • Guide and help team members develop appropriate skillsets to grow in their careers and address underperformance.
  • Communicate cross-functionally and drive engineering efforts.

Requirements:

  • BS or MS in Computer Science or a related technical discipline or equivalent experience.
  • 2+ years of experience managing a networking-related Software Engineering Team.
  • Working knowledge of network transport stack, such as RoCE (RDMA).
  • Experience with software development for Distributed and Embedded systems.
  • Experience recruiting and managing Software Engineers.

Preferred Qualifications:

  • Experience with NCCL and distributed GPU reliability/performance improvement on RoCE/Infiniband.
  • Experience working with DL frameworks like PyTorch, Caffe2, or TensorFlow.
  • Knowledge of ML, deep learning, and LLM.

Compensation:

$177,000/year to $251,000/year + bonus + equity + benefits.

Industry:

Internet.

Equal Opportunity:

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based on race, religion, color, national origin, sex, sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

We also consider qualified applicants with criminal histories, consistent with applicable federal, state, and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at Meta Accessibility.



  • Menlo Park, California, United States META Full time

    Job Summary:Meta's AI Training and Inference Infrastructure is rapidly expanding to support the increasing use of AI. This growth presents a significant scaling challenge that our engineers must address daily. We need to design and evolve our network infrastructure to connect numerous GPUs together efficiently.To improve performance, we continuously look for...


  • Menlo Park, California, United States META Full time

    Job SummaryThe Meta AI Compiler Software team is seeking a Software Engineering Manager to lead the development and optimization of compiler toolchains for Meta's production DL/ML workloads on the MTIA AI accelerator hardware. The ideal candidate will have experience with compiler architecture, development, and management, as well as a strong understanding...


  • Menlo Park, California, United States META Full time

    Job Summary:Meta is seeking a highly skilled AI/HPC Systems Performance Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, deploying, and operating high-performance networks to support our rapidly growing AI workloads.This is an exciting opportunity to work on cutting-edge technologies and contribute...


  • Menlo Park, California, United States Diffuse Bio Full time

    Key Responsibilities:Design and develop software and APIs to enable internal and external access to our AI systems.Build tools to automate and maintain computing clusters and data parsing pipelines.Collaborate with our team of researchers to develop cutting-edge AI solutions.Requirements:Bachelor's or Master's degree in Computer Science or a related...

  • Software Engineer

    4 weeks ago


    Menlo Park, California, United States Meta Full time

    Meta AI Software EngineerWe are seeking a highly skilled AI Software Engineer to join our Research & Development teams at Meta. As a key member of our team, you will be responsible for developing and applying AI and machine learning techniques to build intelligent language systems that improve our products and experiences.ResponsibilitiesApply relevant AI...


  • Menlo Park, California, United States META Full time

    Research Scientist Manager, Generative AI ExpertMeta is seeking a strong technical leader to join our team and work on the next generation of Large Language Models (LLMs). As a technical leader, you will play a critical role in building Meta AI, helping people everywhere get stuff done better and faster.Grow a team of domain experts within Language...


  • Menlo Park, California, United States Brio Digital Full time

    Job Title: Senior Lead Software EngineerAbout the Role:We are seeking a highly skilled Senior Lead Software Engineer to join our team at Brio Digital. As a Senior Lead Software Engineer, you will be responsible for owning the architecture, development, and deployment of high availability systems that empower AI Agents.Key Responsibilities:Design and develop...


  • Menlo Park, California, United States Robinhood Full time

    About the RoleWe are seeking a highly skilled Staff Software Engineer to join our Customer Care team, where you will play a pivotal role in shaping the future of our customer experience through cutting-edge generative AI technologies.As a Staff Software Engineer, you will partner closely with our AI/ML teams and other product teams to create AI-powered...


  • Menlo Park, California, United States January, Inc. Full time

    January AI:We're seeking a highly skilled iOS Engineer to join our team at January AI, a precision health company that combines continuous glucose monitoring with heart rate, sleep, activity, and food tracking to enable individuals to see the impact of their diet and exercise on their body in real-time.The ideal candidate will have a strong understanding of...


  • Menlo Park, California, United States OSI Engineering Full time

    We are seeking a highly skilled Senior Cloud Software Engineer to be a key contributor in developing cloud-based services that will drive the future of OSI Engineering. You will join our small and dynamic Cloud Services team, using the latest technology and tools to build high-quality, cross-platform solutions that delight our customers.Key...


  • Menlo Park, California, United States META Full time

    About the Role:We are seeking a highly skilled Product Manager to lead the development of our next-generation AI infrastructure. The ideal candidate will have a strong background in product management, with a focus on AI and hardware.Key Responsibilities:Establish a shared vision and strategy for a portfolio of products that enable efficient and reliable...

  • Software Engineer

    4 weeks ago


    Menlo Park, California, United States Character Technologies Full time

    About the RoleThe Core Product team at Character Technologies is responsible for delivering exceptional user experiences on both web and mobile platforms. Our team charter encompasses the chat experience, from Character Calls to the home page, search & recommendations, and other features used by millions of users daily.We're building the future of open-ended...


  • Menlo Park, California, United States Character Technologies Full time

    About the RoleThe Core Product team at Character Technologies is responsible for delivering the core product experience on both the web and mobile platforms. Our team charter spans the chat experience from Character Calls, the home page, search & recommendations, and other features used by millions of users every day.We are building the future of open-ended...


  • Menlo Park, California, United States META Full time

    Research Scientist Manager, Generative AIMeta is seeking a seasoned technical leader to drive the development of our next-generation Large Language Models (LLMs). As a Research Scientist Manager, you will play a critical role in shaping the future of AI at Meta.Key Responsibilities:Develop and execute a technical vision for our LLMs, driving innovation and...


  • Menlo Park, California, United States META Full time

    Job Summary:You will be part of the team responsible for ensuring the end-to-end health of Meta's backbone networks, focusing on performance and reliability. Your role will involve building tools and using automation to efficiently scale mitigation of real-time network impacts, identify and investigate long-term trends in performance and risks, and drive...


  • Menlo Park, California, United States META Full time

    Business Development Manager for AI PartnershipsWe are seeking a highly skilled and motivated Business Development Manager to join our AI Partnerships Team at Meta. The successful candidate will play a crucial role in driving global partnerships to support our cutting-edge AI products.At Meta, we have been pioneering advanced approaches to AI for over a...


  • Menlo Park, California, United States META Full time

    Business Development Manager, AI PartnershipsWe are seeking a highly skilled and motivated Business Development Manager to join our AI Partnerships Team at Meta. The successful candidate will play a crucial role in driving global partnerships to support our cutting-edge AI products.As a Business Development Manager, you will be responsible for defining,...

  • Software Developer

    4 weeks ago


    Menlo Park, California, United States Diffuse Bio Full time

    Key Responsibilities:As a Software Developer at Diffuse Bio, you will design, build, and iterate on research infrastructure in close collaboration with research engineers. Your primary focus will be on building tools to automate and maintain computing clusters and data parsing pipelines. Additionally, you will design and build software and APIs that enable...


  • Menlo Park, California, United States META Full time

    Job Title: Production Systems Engineer, Fleet AI SystemsMeta is seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services.Responsibilities:Interface with external...


  • Menlo Park, California, United States META Full time

    Production Systems Engineer, Fleet AI SystemsMeta is seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team. As a key member of our team, you will be responsible for the Hardware Lifecycle of all Meta servers, including pre-production hands-on system and hardware debugging and stress testing, enabling...