Software Engineer

2 days ago


Menlo Park, California, United States Meta Full time
Job Title: Software Engineer - Distributed ML Training

Meta is seeking a highly skilled Software Engineer to join our Network.AI Software team. As a member of this team, you will be responsible for developing and owning the software stack around NCCL (NVIDIA Collective Communications Library), which enables multi-GPU and multi-node data communication through HPC-style collectives.

The team aims to enable Meta-wide ML products and innovations to leverage our large-scale GPU training and inference fleet through an observable, reliable, and high-performance distributed AI/GPU communication stack. Currently, one of the team's focus is on building customized features, SW benchmarks, performance tuners, and SW stacks around NCCL and PyTorch to improve the full-stack distributed ML reliability and performance (e.g. Large-Scale GenAI/LLM training) from the trainer down to the inter-GPU and network communication layer.

Responsibilities:
  1. Enabling reliable and highly scalable distributed ML training on Meta's large-scale GPU training infra with a focus on GenAI/LLM scaling
Requirements:
  1. Bachelor's degree in Computer Science, Computer Engineering, or relevant technical field, or equivalent practical experience
  2. Specialized experience in one or more of the following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems, AI infrastructure, high performance computing, performance optimizations, or Machine Learning frameworks (e.g. PyTorch)
Preferred Qualifications:
  1. PhD in Computer Science, Computer Engineering, or relevant technical field
  2. Experience with NCCL and distributed GPU reliability/performance improvement on RoCE/Infiniband
  3. Experience working with DL frameworks like PyTorch, Caffe2, or TensorFlow
  4. Experience with both data parallel and model parallel training, such as Distributed Data Parallel, Fully Sharded Data Parallel (FSDP), Tensor Parallel, and Pipeline Parallel
  5. Experience in AI framework and trainer development on accelerating large-scale distributed deep learning models
  6. Experience in HPC and parallel computing
  7. Knowledge of GPU architectures and CUDA programming
  8. Knowledge of ML, deep learning, and LLM
Compensation:

$70.67/hour to $208,000/year + bonus + equity + benefits

Industry:

Internet

Equal Opportunity:

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state, and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at https://www.meta.com/accessibility.



  • Menlo Park, California, United States Mainspring Energy, Inc. Full time

    Job Title: Senior Software EngineerWe are seeking a highly skilled Senior Software Engineer to join our team at Mainspring Energy, Inc. in Menlo Park, CA. As a key member of our embedded systems team, you will be responsible for designing and developing software components for our revolutionary linear generator, a sophisticated system that produces clean,...

  • iOS Software Engineer

    3 weeks ago


    Menlo Park, California, United States Meta Full time

    About MetaMeta is a technology company that builds products to help people connect, find communities, and grow businesses. Our mission is to give people the power to build community and bring the world closer together.Job DescriptionWe are seeking a skilled iOS Software Engineer to join our mobile team. As an iOS Software Engineer, you will be responsible...


  • Menlo Park, California, United States Quicken Full time

    Job SummaryWe are seeking an experienced Staff Software Engineer to join our Cloud Services team at Quicken. As a key contributor, you will be responsible for developing cloud-based services that drive the future of our business. You will work closely with a cross-functional team to design and implement end-to-end solutions that meet user needs and...


  • Menlo Park, California, United States Meta Platforms, Inc. Full time

    Job DescriptionMeta Platforms, Inc. is seeking a highly skilled Software Systems Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, developing, and maintaining our software systems to ensure optimal performance and capacity for growth.ResponsibilitiesDevelop, design, create, modify, and/or test software...


  • Menlo Park, California, United States Avails Medical, Inc. Full time

    Job Title: Senior Embedded Software EngineerLocation: Menlo Park, CAReports To: VP EngineeringDepartment: EngineeringJob Summary:We are seeking a highly skilled and motivated Senior Embedded Software Engineer to lead the design, development, and maintenance of cutting-edge software solutions for our In Vitro Diagnostic (IVD) medical devices. In addition to...


  • Menlo Park, California, United States META Full time

    Job SummaryMETA is seeking a highly skilled Software Development Engineer to join our team. As a key member of our production engineering team, you will be responsible for designing, developing, and deploying scalable and efficient software systems.Key ResponsibilitiesDesign, develop, and deploy software services to ensure optimal performance and capacity...


  • Menlo Park, California, United States META Full time

    Job SummaryMeta is seeking a highly skilled Software Systems Engineer to join our team. As a key member of our production engineering team, you will be responsible for designing, developing, and maintaining large-scale software systems that power our products and services.Key ResponsibilitiesDesign and develop software services to ensure optimal performance...


  • Menlo Park, California, United States Equation Staffing Full time

    About Learn to WinLearn to Win is a pioneering education technology company that empowers individuals to create engaging lessons and quizzes with their own content. By leveraging cutting-edge tools, we aim to democratize curriculum creation and make active learning more accessible to everyone.Our team is passionate about delivering next-generation tools for...


  • Menlo Park, California, United States Mainspring Energy, Inc. Full time

    Job Title: Software Quality Assurance EngineerWe are seeking a highly skilled Software Quality Assurance Engineer to join our team at Mainspring Energy, Inc. in Menlo Park, CA.About Mainspring Energy, Inc.Mainspring Energy, Inc. is a leading provider of innovative power generation solutions. Our mission is to accelerate the transition to a net-zero carbon...


  • Menlo Park, California, United States OSI Engineering Full time

    Cloud Services DeveloperWe're seeking an experienced software engineer to contribute to the development of cloud-based services that drive business growth. As a key member of our Cloud Services team, you'll work with the latest technology and tools to build high-quality, cross-platform solutions that delight our customers.Responsibilities:Technical expertise...

  • iOS Software Engineer

    4 weeks ago


    Menlo Park, California, United States Meta Full time

    About the RoleWe are seeking a highly skilled iOS Software Engineer to join our mobile teams at Meta. As a key member of our team, you will be responsible for designing and developing innovative mobile applications for the iOS platform.Key ResponsibilitiesWork closely with our product and design teams to build new and innovative application experiences for...

  • iOS Software Engineer

    2 weeks ago


    Menlo Park, California, United States Meta Full time

    Meta iOS Engineer RoleWe are seeking skilled iOS Engineers to join our mobile teams at Meta, a leader in the fast-paced and evolving technology industry.Key Responsibilities:Work closely with product and design teams to build innovative application experiences for the iOS platform.Implement custom native user interfaces using the latest iOS programming...


  • Menlo Park, California, United States OSI Engineering Full time

    Cloud Services Developer OpportunityWe're seeking an experienced software engineer to contribute to the development of cloud-based services that drive business growth.This role involves working with a dynamic team to build high-quality, cross-platform solutions using the latest technology and tools.As a key contributor, you'll be responsible for delivering...


  • Menlo Park, California, United States Reconstruct Inc. Full time

    About the RoleWe are seeking a highly experienced and skilled Senior Director of Software Engineering to lead our software development teams at Reconstruct Inc. This is a unique opportunity to join a dynamic and innovative company that is revolutionizing the visual reality 'digital twin' global standard for capital assets.Key ResponsibilitiesLead multiple...


  • Menlo Park, California, United States Goldman Sachs Full time

    Job Title: Vice President of Software EngineeringAt Goldman Sachs, we are seeking a highly skilled Vice President of Software Engineering to join our team in Menlo Park, California. As a key member of our engineering organization, you will be responsible for leading the development of advanced data analysis techniques and big data pipelines to support our...


  • Menlo Park, California, United States eInfochips (An Arrow Company) Full time

    About the RoleWe are seeking a highly skilled Embedded Systems Software Engineer to join our team at eInfochips (An Arrow Company). As an Embedded Systems Software Engineer, you will be responsible for developing software features for Smart Glasses and Wrist devices.Key ResponsibilitiesSoftware Development: Develop and maintain software features for Smart...


  • Menlo Park, California, United States eInfochips (An Arrow Company) Full time

    About the RoleWe are seeking a highly skilled Embedded Software Engineer with a strong background in device drivers, Android, and display drivers to join our team at eInfochips (An Arrow Company).Key ResponsibilitiesDesign, develop, code, test, and debug system software with a focus on display.Review code and design to ensure high-quality software.Analyze...

  • Software Engineer

    1 week ago


    Menlo Park, California, United States OSI Engineering Full time

    Job Title: Software EngineerWe are seeking a talented Software Engineer to join our Frontend Engineering team at OSI Engineering in Menlo Park, CA. As a key member of our team, you will be responsible for developing high-quality mobile and web applications that will drive our future business.Responsibilities:Design and develop high-quality code following...


  • Menlo Park, California, United States eInfochips (An Arrow Company) Full time

    About the RoleWe are seeking a highly skilled Embedded Systems Software Engineer to join our team at eInfochips (An Arrow Company). As an Embedded Systems Software Engineer, you will be responsible for developing software features for Smart Glasses and Wrist devices.Key ResponsibilitiesSoftware Development: Develop and maintain software features for Smart...


  • Menlo Park, California, United States Reconstruct Full time

    Lead Software Development at ReconstructWe are seeking a seasoned Senior Director of Software Engineering to join our team at Reconstruct. As a key member of our leadership team, you will be responsible for leading multiple software development teams to enhance and maintain best-of-class software development practices.Key ResponsibilitiesLead software...