Software Engineering Manager, AI Networking
5 months ago
Summary: In this role, you will be a member of the Network AI Software team and part of the bigger DC networking organization. The team develops and owns the software stack around collective communication libraries around Meta.At the high level, the team aims to enable Meta-wide ML products and innovations to leverage our large-scale training and inference fleet through an observable, reliable and high-performance distributed AI communication stack. Currently, one of the team’s focus is on building customized features, SW benchmarks, performance tuners and SW stacks around PyTorch to improve the full-stack distributed ML reliability and performance (e.g. Large-Scale GenAI/LLM training) from the trainer down to the network communication layer. And we are seeking for leaders to work on the space of GenAI/LLM scaling reliability and performance. Required Skills: Software Engineering Manager, AI Networking Responsibilities: Help define technical roadmap for the team, drive execution of associated tasks and support the team in resolving dependencies Collaborate effectively with other groups such as Hardware, Infrastructure, Operations Interact with external partners as needed in resolving dependencies associated with objectives Guide and help team members develop appropriate skillsets to grow in their careers, and where necessary address under performance Communicate cross-functionally and drive engineering efforts Minimum Qualifications: Minimum Qualifications: BS or MS in Computer Science or related technical discipline or equivalent experience 2+ years experience managing a networking related Software Engineering Team Working knowledge of network transport stack such as RoCE (RDMA) Experience with software development for Distributed and Embedded systems Experience recruiting and managing Software Engineers Preferred Qualifications: Preferred Qualifications: Experience with NCCL and distributed GPU reliability/performance improvment on RoCE/Infiniband Experience working with DL frameworks like PyTorch, Caffe2 or TensorFlow Knowledge of ML, deep learning and LLM Public Compensation: $177,000/year to $251,000/year + bonus + equity + benefits Industry: Internet Equal Opportunity: Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
-
Menlo Park, California, United States META Full timeCompany Overview">META is a leading technology company that aims to bring people closer together. Our Host Networking team is responsible for building high-performance network solutions for our AI workloads, and we are seeking a highly technical manager to lead our AI Transport group.">Salary">The estimated salary for this role is $177,000/year to...
-
Software Engineering Manager, AI Transport
2 months ago
Menlo Park, United States META Full timeSummary: The Host Networking team is responsible for all aspects of networking specific to servers including networking applications, network transport and analytics and NICs. The team is increasingly focused on building high performance network solutions for our AI workloads.We are looking for a manager who will lead the group developing network drivers...
-
Software Engineer, SystemML
2 days ago
Menlo Park, United States META Full timeSummary: In this role, you will be a member of the AI Networking Software team and part of the bigger DC networking organization. The team develops and owns the software stack around NCCL (NVIDIA Collective Communications Library), which enables multi-GPU and multi-node data communication through HPC-style collectives. NCCL has been integrated into PyTorch...
-
Software Engineering Manager, AI Transport
2 days ago
Menlo Park, United States META Full timeSummary: The Host Networking team is responsible for all aspects of networking specific to servers including networking applications, network transport and analytics and NICs. The team is increasingly focused on building high performance network solutions for our AI workloads.We are looking for a manager who will lead the group that will enhance TCP as the...
-
AI/HPC Network Engineer
2 weeks ago
Menlo Park, United States META Full timeSummary: Meta's AI Training and Inference Infrastructure is growing exponentially to support ever increasing use cases of AI. This results in a dramatic scaling challenge that our engineers have to deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of GPUs together. In addition, we need to ensure that...
-
AI/HPC Network Engineer
1 month ago
Menlo Park, United States META Full timeSummary: Meta's AI Training and Inference Infrastructure is growing exponentially to support ever increasing use cases of AI. This results in a dramatic scaling challenge that our engineers have to deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of GPUs together. In addition, we need to ensure that the...
-
Senior AI Networking Manager
2 weeks ago
Menlo Park, California, United States META Full timeCompany OverviewMETA is a leader in the development of innovative technologies that enable people to connect, find communities and grow businesses. Our Network AI Software team is part of the DC networking organization and focuses on building a software stack around collective communication libraries. SalaryThe estimated annual salary for this role is...
-
Menlo Park, California, United States META Full timeAbout METAMETA is a leader in the field of artificial intelligence and networking. Our mission is to create innovative technologies that enable fast and reliable communication between machines.Job DescriptionWe are seeking an experienced Software Engineering Manager to join our Network AI Software team. As a member of this team, you will be responsible for...
-
Software Engineering Manager, AI Compiler
5 months ago
Menlo Park, United States META Full timeSummary: The MTIA (Meta Training & Inference Accelerator) Software team has been developing a comprehensive AI Compiler strategy and optimizing compiler toolchains. This enables training and inference of Meta’s production DL/ML workloads on the specialized MTIA AI accelerator hardware in a highly performant and flexible way.We are looking for a Software...
-
AI Network Infrastructure Specialist
2 weeks ago
Menlo Park, California, United States META Full timeMeta is expanding its AI Training and Inference Infrastructure, driven by increasing use cases of AI. This scaling challenge requires our engineers to build and evolve network infrastructure that connects multiple GPUs together.Job ResponsibilitiesDesign, develop, test, and operate networking systems to support large-scale AI training jobs.Establish global...
-
Engineering Manager
3 weeks ago
Menlo Park, California, United States Brio Digital Full timeWe're seeking a talented Engineering Manager - AI to lead our team in developing innovative AI solutions. As a key member of our engineering team, you'll work closely with the Founding Engineer and CTO to shape our technical direction.Your responsibilities will include leading cross-functional teams, mentoring junior engineers, and driving technical...
-
Staff Software Engineer, Back End
5 days ago
Menlo Park, United States OSI Engineering Full timeStaff Software Engineer, Back EndWe’re seeking a highly skilled Staff Software Engineer to focus on integrating cutting-edge AI services and improving backend platform performance. This role offers the opportunity to work on innovative AI-powered features while ensuring the underlying platform is robust, scalable, and efficient. You will collaborate with...
-
Staff Software Engineer, Back End
1 week ago
Menlo Park, United States OSI Engineering Full timeStaff Software Engineer, Back EndWe’re seeking a highly skilled Staff Software Engineer to focus on integrating cutting-edge AI services and improving backend platform performance. This role offers the opportunity to work on innovative AI-powered features while ensuring the underlying platform is robust, scalable, and efficient. You will collaborate with...
-
Manager, Software Engineering, MTIA Software
2 months ago
Menlo Park, United States Meta Inc Full timeSummary: The MTIA (Meta Training & Inference Accelerator) Software team is part of AI Infra PyTorch org. The team’s mission is to explore, develop and help productize high-performance software and hardware technologies for AI at datacenter scale. The team co-optimizes both SW (e.g., algorithms and numerics) and HW (e.g., platform and network) to come up...
-
Technical Program Manager, AI Transport
2 weeks ago
Menlo Park, California, United States META Full timeJob SummaryMETA's Host Networking team is seeking a highly technical manager to lead the AI Transport group. As a Software Engineering Manager, AI Transport, you will be responsible for designing and developing high-performance network solutions for AI workloads. You will also manage engineers working to build, scale, deploy and support network systems for...
-
Cutting-Edge AI Researcher
1 month ago
Menlo Park, California, United States Altera AI Full timeAt Altera AI, we're pushing the boundaries of innovation and seeking a talented Cutting-Edge AI Researcher to join our team. This is a full-time position based on-site in Menlo Park, with a competitive salary and additional benefits.About UsWe're a cutting-edge research organization dedicated to bringing our groundbreaking ideas to life. Our Research...
-
Senior Software Engineer
2 weeks ago
Menlo Park, California, United States Robinhood Full timeUnlocking Financial Inclusivity with AIRoznovat's aim to democratize finance for all requires innovative solutions, and as a Senior Software Engineer on our Agentic AI team, you will play a key role in shaping this vision. With a strong focus on large-scale system design, you will lead the development of enterprise-grade features that harness the power...
-
Senior AI Research Specialist
3 weeks ago
Menlo Park, California, United States Altera AI Full timeJob Title: Research and Development EngineerAltera AI seeks a talented Research and Development Engineer to drive the advancement of AI research.You will play a pivotal role in bringing our innovative research to life by implementing and experimenting with the latest research techniques in AI and machine learning.Your responsibilities will include...
-
Senior Software Engineer, Agentic AI
4 weeks ago
Menlo Park, United States Robinhood Full timeAbout the team + roleWe are seeking a dedicated and ambitious individual to accelerate the development and expansion of products powered by Gen AI to democratize finance at an unprecedented pace. In this role, you'll play a key part in Robinhood’s forward trajectory, collaborating closely with our adept Data Science and Engineering teams. The Gen AI team...
-
Full Stack Software Engineer in Generative AI
1 month ago
Menlo Park, California, United States BITO Full timeCompany OverviewBito is a pioneering technology company at the forefront of generative AI, committed to empowering developers with cutting-edge tools. Our mission is to revolutionize software development by leveraging the incredible capabilities of generative AI.We believe that software development will undergo a transformative shift in the next 5-10 years,...