AI/HPC Network Architect

4 weeks ago


Menlo Park, California, United States META Full time

Job Summary:

Meta's AI Training and Inference Infrastructure is rapidly expanding to support the increasing use of AI. This growth presents a significant scaling challenge that our engineers must address daily. We need to design and evolve our network infrastructure to connect numerous GPUs together efficiently.

To improve performance, we continuously look for opportunities across our infrastructure stack, including network fabric, host networking, comms libraries, and scheduling infrastructure.

Key Responsibilities:

  • Design, develop, test, and operate networking systems to support large-scale AI training jobs.
  • Research, develop, and deploy various technologies and network topologies to evolve and scale our AI networks.
  • Collaborate with hardware, software, and sourcing teams to develop new networking solutions and influence the future of networking and its associated infrastructure.
  • Define and develop optimized network automation tools and systems, including configuration, provisioning, monitoring, alarming, auto-remediation, and more.

Requirements:

  • Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field, or equivalent practical experience.
  • Experience in designing, deploying, and operating datacenter networks at scale.
  • Proficiency in coding languages like Python, C++, and Go.
  • Experience in network automation software leveraging software-defined networking principles.

Preferred Qualifications:

  • 4+ years of experience working on networks supporting large-scale training workloads.
  • Expert knowledge of IB/RDMA/RoCE networks.
  • Understanding of AI training workloads and their demands on networks.
  • Understanding of RDMA congestion control mechanisms on IB and RoCE networks.

Compensation:

$147,000/year to $208,000/year + bonus + equity + benefits

Industry:

Internet

Equal Opportunity:

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based on race, religion, color, national origin, sex, sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.



  • Menlo Park, California, United States META Full time

    Job Summary:Meta is seeking a highly skilled AI/HPC Systems Performance Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, deploying, and operating high-performance networks to support our rapidly growing AI workloads.This is an exciting opportunity to work on cutting-edge technologies and contribute...


  • Menlo Park, California, United States META Full time

    About the Role:We are seeking a highly skilled Product Manager to lead the development of our next-generation AI infrastructure. The ideal candidate will have a strong background in product management, with a focus on AI and hardware.Key Responsibilities:Establish a shared vision and strategy for a portfolio of products that enable efficient and reliable...

  • Research Scientist

    3 weeks ago


    Menlo Park, California, United States META Full time

    Research Scientist - Systems ML and HPC Co-Design ExpertMeta is seeking a highly skilled Research Scientist to join our Research & Development teams. The ideal candidate will have industry experience working on AI Infrastructure related topics, with a strong focus on Systems ML and HPC Co-Design.Key Responsibilities:Apply High-Performance Computing (HPC)...


  • Menlo Park, California, United States META Full time

    Job DescriptionMeta is seeking a highly skilled Research Scientist to join our Research & Development teams. The ideal candidate will have industry experience working on AI Infrastructure related topics and a strong background in Systems ML and HPC.Key ResponsibilitiesApply High-Performance Computing (HPC) algorithms and techniques to optimize large-scale AI...


  • Menlo Park, California, United States Resource Logistics Full time

    Job Title: Network Solutions ArchitectAbout the Role: We are seeking a highly skilled Network Solutions Architect to join our team at Resource Logistics. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining secure network solutions for our clients.Key Responsibilities:* Design and implement secure...

  • Network Architect

    4 weeks ago


    Menlo Park, California, United States META Full time

    Job Summary:We are seeking a highly skilled Network Engineer to join our Engineering R&D team at Meta. The successful candidate will be responsible for designing, deploying, and managing infrastructure in our labs, working closely with engineering teams to understand business goals and test requirements.The ideal candidate will have a strong background in...

  • Network Architect

    3 weeks ago


    Menlo Park, California, United States META Full time

    Job Summary:We are seeking a highly skilled Network Engineer to join our team at Meta. As a Network Engineer, you will be responsible for designing, planning, deploying, and managing infrastructure in our labs.Our team is responsible for network design, deployment, and operations for Meta's global engineering labs. With the importance of rapidly maturing new...


  • Menlo Park, California, United States META Full time

    Job Summary:In this role, you will be a key member of the Network AI Software team, part of the larger DC networking organization at Meta. The team is responsible for developing and owning the software stack around collective communication libraries.The team's primary goal is to enable Meta-wide ML products and innovations to leverage our large-scale...


  • Menlo Park, California, United States META Full time

    Job Title: Production Systems Engineer, Fleet AI SystemsMeta is seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services.Responsibilities:Interface with external...


  • Menlo Park, California, United States META Full time

    Production Systems Engineer, Fleet AI SystemsMeta is seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team. As a key member of our team, you will be responsible for the Hardware Lifecycle of all Meta servers, including pre-production hands-on system and hardware debugging and stress testing, enabling...

  • Security Engineer

    4 weeks ago


    Menlo Park, California, United States Character Technologies Full time

    About the RoleWe are seeking a highly skilled Security Engineer to lead our Privacy Engineering efforts at Character Technologies. As a key member of our security team, you will partner closely with cross-functional partners to implement privacy controls for our growing AI platform and build the technology that powers them.This is an exciting opportunity to...

  • Electrical Engineer

    3 weeks ago


    Menlo Park, California, United States PacBio Full time

    PacBio is seeking an experienced electrical engineer to design and implement electronic subsystems for our next-generation SMRT sequencing instruments.This person will work closely with systems engineers to understand and refine requirements, document hardware architecture options, and champion solutions through interdisciplinary design reviews.The ideal...


  • Menlo Park, California, United States META Full time

    Job Summary:This role is responsible for designing, building, and maintaining large-scale networks that can handle the demands of serving over a billion users. The ideal candidate will have a strong background in computer networks, UNIX, and TCP/IP, as well as experience with automation and continuous improvement.Key Responsibilities: Build experience with...


  • Menlo Park, California, United States META Full time

    Meta Hardware Systems EngineerMeta is seeking a skilled Hardware Systems Engineer to join our Release to Production (RTP) team. As a key member of this team, you will be responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping of experimental HW, pre-production hands-on system and hardware debugging and stress testing,...


  • Menlo Park, California, United States META Full time

    Meta AI Systems Machine Learning Research Scientist InternMeta is seeking a Research Scientist Intern to join its Fundamental AI Research (FAIR) team. As a Research Scientist Intern, you will contribute to advancing the field of artificial intelligence by making fundamental advances in technologies to help interact with and understand our...

  • AI Research Scientist

    4 weeks ago


    Menlo Park, California, United States META Full time

    Meta Research Scientist - Language and AIMeta is seeking a talented Research Scientist to join our team focused on developing efficient foundational models for recommendation systems. We are looking for a candidate with experience in natural language processing, language models, and general AI to contribute to our research goals.Responsibilities:Develop...

  • AI Research Scientist

    4 weeks ago


    Menlo Park, California, United States META Full time

    Meta AI Research Scientist - Language Model DeveloperMeta is seeking a talented AI Research Scientist to join the Modern Recommendation System team. The organization focuses on developing efficient foundational models for recommendation systems, enabling differentiated and delightful user experiences across Meta products.Develop algorithms based on...


  • Menlo Park, California, United States META Full time

    Job Summary:The GenAI Speech team at Meta is currently seeking a Research Scientist Intern to contribute to the development of spoken language technology. Our team creates innovative solutions to make it faster and easier for people to build community and connect with others around the world. We conduct product-motivated research in ML/AI and design,...


  • Menlo Park, California, United States META Full time

    Research Scientist Intern - Gen AI Large Language ModelsMeta is seeking a highly motivated Research Scientist Intern to contribute to the development of large language models and AI technologies. As a Research Scientist Intern, you will have the opportunity to work on cutting-edge projects, collaborate with top researchers, and publish results that can...


  • Menlo Park, California, United States META Full time

    Research Scientist Intern, Gen AI Large Language ModelsMeta is committed to advancing the field of artificial intelligence by making fundamental advances in technologies to help interact with and understand our world. We are seeking individuals passionate in areas such as deep learning, computer vision, audio and speech processing, natural language...