Hardware Systems Engineer

4 hours ago


Menlo Park, California, United States META Full time
Job Summary

Meta is seeking a highly skilled Hardware Systems Engineer to join our Release to Production (RTP) team. As a key member of this team, you will be responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping, debugging, and stress testing.

The RTP team is responsible for ensuring the efficient operation of our servers and data centers, which are the foundation of our rapidly scaling infrastructure. You will work closely with HW/SW co-design teams, hardware designers, networking teams, system manufacturers, component vendors, capacity engineering, production engineering, production services, and data center operations teams to enable new systems that will be deployed in our production data centers.

Responsibilities
  • Interface with external vendors and internal hardware, mechanical, power, thermal, manufacturing, and software engineers to understand system architecture and guide the development of Hardware Fault Management for various server products.
  • Leverage your deep understanding of RAS (reliability, availability, serviceability) to improve error reporting and error handling mechanisms for better operation quality and cost/efficiency.
  • Champion engineering and operational excellence, establishing metrics and processes for regular assessment and improvement.
  • Develop visibility through data visualization and implement systemic solutions to hardware health issues.
  • Proactively create experiments and tooling to detect and diagnose hardware/firmware/software health issues.
  • Troubleshoot, diagnose, and root cause system failures and isolate components/failure scenarios while working with internal and external stakeholders.
  • Drive necessary discussions with external and internal teams on test specification and methodologies to improve test quality continuously.
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field, or equivalent practical experience.
  • 5+ years of work experience in one or more domains such as ASIC development, compute (ARM, x86), AI-ML hardware/software (GPUs, TPUs).
  • Knowledge of architecture and components on one of the following products: server/PC/Laptop.
  • Development or debug experience in one or more following areas: hardware fault management, error reporting, error handling on hardware products.
Preferred Qualifications
  • 7+ years of experience with one subset of the following AI systems: Accelerator (GPU/ASIC), Kernel development, Performance optimization (e.g., NVIDIA, AMD, Intel, or other misc accelerator), computer architecture, HPC communication libraries (e.g., NCCL, MPI), performance enablement, tracing, profiling, and debugging.
  • Experience with architecture of disaggregated systems at scale.
  • Understanding of hardware development process and how to scope out test plans accordingly.
  • Experience troubleshooting problems at system level, crossing across multiple components, as well as hardware/firmware/software boundaries.
Compensation

$124,000/year to $191,000/year + bonus + equity + benefits

Industry

Internet

Equal Opportunity

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

We also consider qualified applicants with criminal histories, consistent with applicable federal, state, and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at.



  • Menlo Park, California, United States META Full time

    Meta Hardware Engineer Job DescriptionWe are seeking a skilled Hardware Engineer to join our Compute Hardware team at Meta. Our mission is to develop cutting-edge hardware infrastructure that supports our massive computational challenges.Key Responsibilities:Collaborate with local and remote teams to define product roadmaps and programs.Design and develop...


  • Menlo Park, California, United States META Full time

    Hardware Systems EngineerMeta is seeking an experienced Hardware Systems Engineer to join our Release to Production (RTP) team. As a key member of our team, you will be responsible for the Hardware Lifecycle of all Meta servers, including pre-production hands-on system and hardware debugging and stress testing, enabling production-ready system monitoring,...


  • Menlo Park, California, United States META Full time

    Job SummaryMeta is seeking a skilled Hardware Engineer to join our Compute Hardware team. Our mission is to design and develop cutting-edge hardware infrastructure that supports our massive data centers and affects billions of users.Key ResponsibilitiesCollaborate with local and remote teams to define product roadmaps and programs.Design, develop, and bring...


  • Menlo Park, California, United States META Full time

    Hardware Systems Engineer, NPIMeta is seeking a skilled Hardware Systems Engineer to join our Release to Production (RTP) team. As a key member of our team, you will be responsible for the Hardware Lifecycle of all Meta servers, including pre-production hands-on system and hardware debugging and stress testing, enabling production-ready system monitoring,...


  • Menlo Park, California, United States META Full time

    Hardware Systems Engineer, NPIMeta is seeking an experienced Hardware Systems Engineer to join our Release to Production (RTP) team. As a key member of our team, you will be responsible for the Hardware Lifecycle of all Meta servers, including pre-production hands-on system and hardware debugging and stress testing, enabling production-ready system...


  • Menlo Park, California, United States META Full time

    Job SummaryMeta is seeking a skilled Hardware Systems Engineer to join our Release to Production (RTP) team. As a key member of our team, you will be responsible for the Hardware Lifecycle of all Meta servers, including pre-production hands-on system and hardware debugging and stress testing, enabling production-ready system monitoring, automated...


  • Menlo Park, California, United States META Full time

    Hardware Systems Engineer, NPIMeta is seeking a skilled Hardware Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the backbone of our rapidly scaling infrastructure, and we need an expert to ensure they operate efficiently and deliver our innovative services.Key Responsibilities:Collaborate with external vendors...


  • Menlo Park, California, United States META Full time

    About the RoleMeta is seeking a highly skilled Hardware Systems Engineer to join our Release to Production (RTP) team, focusing on AI/ML initiatives that support large-scale AI training and inference. Our servers and data centers form the foundation of our rapidly scaling infrastructure, enabling the delivery of innovative services.Key ResponsibilitiesLead...


  • Menlo Park, California, United States META Full time

    Power Systems Hardware EngineerMeta is seeking a skilled Power Systems Hardware Engineer to join our Power Systems team. As a key member of our team, you will be responsible for designing and developing power delivery and DC-DC power conversion solutions in hardware systems.Responsibilities:Specify, design, and develop power delivery and DC-DC power...


  • Menlo Park, California, United States Mainspring Energy Full time

    About Mainspring EnergyMainspring Energy is a pioneering company that's revolutionizing the power generation industry with its innovative linear generator technology. Our mission is to accelerate the transition to a net-zero carbon grid, and we're looking for talented individuals to join our team.Job SummaryWe're seeking an experienced Electrical Hardware...


  • Menlo Park, California, United States Mainspring Energy Full time

    About Mainspring EnergyMainspring Energy is a pioneering company that's revolutionizing the power generation industry with its innovative linear generator technology. Our mission is to accelerate the transition to a net-zero carbon grid, and we're looking for talented individuals to join our team.Job SummaryWe're seeking an experienced Electrical Hardware...


  • Menlo Park, California, United States META Full time

    About the RoleWe are seeking a highly skilled Technical Program Manager to lead the development of our server hardware systems. As a key member of our Infrastructure Engineering organization, you will be responsible for managing the end-to-end development of our hardware products, from proof of concept to successful ingestion into our infrastructure.Key...


  • Menlo Park, California, United States META Full time

    About the RoleWe are seeking a highly skilled Technical Program Manager to lead the development of server hardware systems for Meta's Infrastructure organization. As a key member of our team, you will be responsible for managing the end-to-end development of hardware products, including servers, modules, chassis, and subsystems.Key ResponsibilitiesProgram...


  • Menlo Park, California, United States META Full time

    Meta's Power Systems TeamWe are seeking a skilled Electrical Engineer to join our Power Systems team at Meta. Our team is responsible for designing and developing power delivery and DC-DC power conversion solutions for our data centers.Key Responsibilities:Specify, design, and develop power delivery and DC-DC power conversion solutions in hardware...


  • Menlo Park, California, United States META Full time

    Thermal Engineer Job DescriptionMeta is seeking a skilled Thermal Engineer to join our Hardware Design Team. Our systems and data-centers are the foundation upon which our rapidly scaling infrastructure operates and upon which our innovative services are delivered.Key Responsibilities:Engage with the Hardware Design team to ensure proper component design and...


  • Menlo Park, California, United States Mainspring Energy, Inc. Full time

    Job OverviewMainspring Energy, Inc. is a pioneering company in the development of innovative power generation technologies. We are seeking an experienced Electrical Hardware Design Engineer to join our team and contribute to the design and development of our cutting-edge Linear Generator.Key ResponsibilitiesDesign and develop low-voltage electrical systems...


  • Menlo Park, California, United States Mainspring Energy Full time

    About Mainspring EnergyMainspring Energy is a pioneering company that's revolutionizing the power generation industry with its innovative linear generator technology. Our mission is to accelerate the transition to a net-zero carbon grid, and we're looking for talented individuals to join our team.Job SummaryWe're seeking an experienced Electrical Engineer to...


  • Menlo Park, California, United States Mainspring Energy Full time

    About Mainspring EnergyMainspring Energy is a pioneering company that's revolutionizing the power generation industry with its innovative linear generator technology. Our mission is to accelerate the transition to a net-zero carbon grid, and we're looking for talented individuals to join our team.Job SummaryWe're seeking an experienced Electrical Engineer to...


  • Menlo Park, California, United States META Full time

    About the RoleWe are seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team at Meta. As a key member of our team, you will play a critical role in ensuring the efficient operation of our rapidly scaling infrastructure, which is the foundation upon which our innovative services are delivered.Key...


  • Menlo Park, California, United States META Full time

    Meta Accelerator Design EngineerWe are seeking a skilled Hardware Engineer to join our Accelerator Design team at Meta. Our mission is to design and develop cutting-edge hardware solutions for AI accelerators, shaping the future of data centers and impacting billions of users.Key Responsibilities:Design and develop hardware and platform solutions with...