Hardware Systems Engineer

23 hours ago


Bellevue, Washington, United States META Full time
Hardware Systems Engineer at Meta

Meta is seeking a skilled Hardware Systems Engineer to join our Release to Production (RTP) team working on new NPI hardware.

Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services.

The RTP team is responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping of experimental HW, pre-production hands-on system and hardware debugging and stress testing, enabling production-ready system monitoring, automated provisioning and automated remediation of issues.

Hardware Systems Engineer in RTP work closely with HW/SW co-design teams, hardware designers, networking teams, system manufacturers, component vendors, capacity engineering, production engineering, production services, and data center operations teams to enable new systems that will be deployed in our production data centers.

We also work across service and hardware architectures for new AI systems, build prototypes to demonstrate the value, enable go/no-go decisions and optimize these systems in production.

Responsibilities
  • Interface with external vendors and internal hardware, mechanical, power, thermal, manufacturing and software engineers to understand system architecture to guide and develop Hardware Fault Management for various server products.
  • Leverage deep understanding RAS (reliability, availability, serviceability) to improve error reporting and error handling mechanism for better operation quality and cost/efficiency.
  • Champion engineering and operational excellence, establishing metrics and process for regular assessment and improvement.
  • Develop visibility through data visualization and implement systemic solutions to hardware health issues.
  • Proactively create experiments and tooling to detect and diagnose hardware/firmware/software health issues.
  • Troubleshoot, diagnose and root cause of system failures and isolate the components/failure scenarios while working with internal & external stakeholders.
  • Drive necessary discussion with external and internal teams on test specification and methodologies to improve test quality continuously.
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
  • 5+ years of work experience in one or more domains such as: ASIC development (Silicon design or bringup or characterization), compute (ARM, x86), AI-ML hardware/software (GPUs, TPUs).
  • Knowledge of architecture and components on one of the following products: server/PC/Laptop.
  • Development or debug experience in one or more following areas: hardware fault management, error reporting, error handling on hardware products.
Preferred Qualifications
  • 7+ years of experience with one subset of the following AI systems: Accelerator (GPU/ASIC), Kernel development, Performance optimization (e.g., NVIDIA, AMD, Intel, or other misc accelerator), computer architecture, HPC communication libraries (e.g., NCCL, MPI), performance enablement, tracing, profiling and debugging.
  • Experience with architecture of disaggregated systems at scale.
  • Understanding of hardware development process and how to scope out test plans accordingly.
  • Experience troubleshooting problems at system level, crossing across multiple components, as well as hardware/firmware/software boundaries.

For those who live in or expect to work from California if hired for this position, please click here for additional information.

Meta is committed to providing reasonable support (called accommodations) in our recruiting processes for candidates with disabilities, long term conditions, mental health conditions or sincerely held religious beliefs, or who are neurodivergent or require pregnancy-related support.

If you need support, please reach out to Meta's Disability Accommodations team.

Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable.

In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.



  • Bellevue, Washington, United States META Full time

    Hardware Systems Engineer, RASWe are seeking a skilled Hardware Systems Engineer to join our Release to Production (RTP) team at Meta. As a key member of our team, you will be responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping, debugging, and stress testing.Key Responsibilities:Interface with external vendors and...


  • Bellevue, Washington, United States ZT Systems Full time

    About the RoleThe Validation Support Technician 3 will be responsible for providing technical support to Validation projects in various stages of development. The Technician must be able to handle multiple projects simultaneously with minimal supervision to maintain project schedules and deadlines.Key ResponsibilitiesHands-on building, testing, upgrading,...


  • Bellevue, Washington, United States Amazon Full time

    About the RoleWe are seeking a highly skilled and experienced Senior Hardware Development Engineer to join our Last Mile Technology team at Amazon. As a key member of our team, you will be responsible for designing and developing innovative embedded solutions for our fleet of vehicles and lockers.Key ResponsibilitiesDevelop and lead new hardware innovations...


  • Bellevue, Washington, United States Volant Partners Inc. Full time

    About Volant PartnersVolant Partners is a consulting firm that specializes in providing business and engineering services to companies across the United States. Our team thrives on solving complex problems, particularly in regulated industries. We prioritize attracting and retaining top talent, which is why we've structured ourselves as an employee-owned...


  • Bellevue, Washington, United States Fresh Consulting Full time

    About the RoleFresh Consulting is seeking an experienced Technical Program Manager to lead complex, multi-disciplinary hardware and software engineering projects. As a Technical Program Manager at Fresh Consulting, you will collaborate with external clients to understand the problem/opportunity space, define requirements, and usher projects through the...


  • Bellevue, Washington, United States META Full time

    About the RoleMeta is on the lookout for a skilled Production Systems Engineer to become a vital part of our Release to Production (RTP) team. Our infrastructure, which includes servers and data centers, is essential for the efficient operation of our rapidly expanding services. The RTP team plays a crucial role in managing the Hardware Lifecycle of all Meta...


  • Bellevue, Washington, United States Amazon Full time

    About the RoleWe are seeking an experienced Interdisciplinary Systems Engineer to join our Amazon Fulfillment Technologies (AFT) team. As a key member of our Quality team, you will be responsible for designing and developing hardware and software systems that utilize imaging sensors and computer vision to achieve perception-based decisions.Key...


  • Bellevue, Washington, United States META Full time

    Job DescriptionMeta is seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services.Key ResponsibilitiesCollaborate with cross-functional teams to understand system...


  • Bellevue, Washington, United States META Full time

    Job DescriptionMeta is seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services.Key ResponsibilitiesCollaborate with cross-functional teams to understand system...


  • Bellevue, Washington, United States Volant Partners Inc. Full time

    About Volant PartnersVolant Partners is a business and engineering consulting services company that provides expertise to companies across the United States. Our focus is on solving complex problems in regulated industries, and we achieve this by attracting and retaining top consultants. As an employee-owned company, we prioritize collaboration and teamwork...


  • Bellevue, Washington, United States Actalent Full time

    Job Title: Hardware Test EngineerJob DescriptionAs a Hardware Test Engineer at Actalent, you will play a critical role in ensuring the quality and performance of our clients' hardware products. Your primary responsibility will be to develop and execute comprehensive test plans, test cases, and test scripts to validate hardware product performance,...


  • Bellevue, Washington, United States MediaTek Full time

    About the RoleMediaTek is seeking a skilled software engineer to join our team, specializing in embedded systems and semiconductor industry expertise. Our team collaborates with customers to deliver high-quality consumer products with competitive computation capabilities.The successful candidate will work closely with customer engineers to resolve system...


  • Bellevue, Washington, United States Amazon Full time

    Join Our Team as an Industrial Systems Engineer! Have you ever marveled at the speed of delivery for products ordered online? At Amazon, our engineers are dedicated to minimizing the time between order placement and delivery for our customers. Amazon Fulfillment Technologies (AFT) is the driving force behind our global fulfillment operations. We design and...


  • Bellevue, Washington, United States Amazon Full time

    About the RoleWe are seeking a highly skilled Senior Hardware Development Engineer to join our Fulfillment Technology & Robotics team. As a key member of our team, you will be responsible for conceiving, designing, and developing innovative products for e-commerce operations.Key ResponsibilitiesServe as technical lead within a multi-disciplinary team,...


  • Bellevue, Washington, United States Amazon Full time

    About the RoleWe are seeking a skilled Mechanical Design Engineer to join our team at Amazon Fulfillment Technologies (AFT). As a key member of our team, you will be responsible for conceiving, developing, and deploying modular data-collection systems at scale.Key ResponsibilitiesDesign mechanical systems for mounting cameras, optics, and other...


  • Bellevue, Washington, United States Amazon Full time

    Job SummaryWe are seeking a highly skilled Systems Development Engineer to join our team at Amazon. As a key member of our Central Services team, you will play a critical role in ensuring the reliability and performance of our equipment and systems.Key ResponsibilitiesCoordinate and plan work activities for site Control Systems Engineers to achieve goals and...


  • Bellevue, Washington, United States Meta Platforms, Inc. Full time

    Job DescriptionMeta Platforms, Inc. is seeking a highly skilled Production Systems Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, developing, and maintaining large-scale cloud-based systems that support our social media platforms.Key ResponsibilitiesDesign and Develop Cloud-Based Systems:...


  • Bellevue, Washington, United States Meta Platforms, Inc. Full time

    Job DescriptionMeta Platforms, Inc. is seeking a highly skilled Production Systems Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, developing, and maintaining large-scale cloud-based systems that support our social media platforms.ResponsibilitiesDesign and Develop Cloud-Based Systems: Collaborate...


  • Bellevue, Washington, United States Amazon Full time

    About the RoleWe are seeking a highly skilled Senior Systems Development Engineer to join our Fulfillment Optimization team at Amazon. As a key member of our team, you will be responsible for designing, developing, and deploying large-scale infrastructure solutions that support our global fulfillment network.Key ResponsibilitiesIdentify and solve complex...


  • Bellevue, Washington, United States Amazon Full time

    Are you driven by innovation? Is collaborative problem-solving part of your core values? Do you envision how your contributions can influence the broader landscape? If you answered yes, you may find your place with us at Amazon Robotics. We are a dedicated team of innovators committed to leveraging advanced robotics and software technologies to tackle...