Current jobs related to Systems Engineering Specialist, AI Fleet Operations - Menlo Park, California - META


  • Menlo Park, California, United States META Full time

    Production Systems Engineer - Fleet AI SystemsMeta is seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services.ResponsibilitiesInterface with external vendors and...


  • Menlo Park, California, United States META Full time

    Job Title: Production Systems Engineer - Fleet AI SystemsMeta is seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services.Responsibilities:Interface with external...


  • Menlo Park, California, United States META Full time

    Job Title: Production Systems Engineer, Fleet AI SystemsMeta is seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services.Responsibilities:Interface with external...


  • Menlo Park, California, United States META Full time

    Production Systems Engineer, Fleet AI SystemsMeta is seeking a highly skilled Production Systems Engineer to join our Release to Production (RTP) team. As a key member of our team, you will be responsible for the Hardware Lifecycle of all Meta servers, including pre-production hands-on system and hardware debugging and stress testing, enabling...


  • Menlo Park, California, United States Mainspring Energy, Inc. Full time

    About Mainspring Energy, Inc.Mainspring Energy, Inc. is a pioneering company in the field of clean and affordable electricity. Our mission is to accelerate the transition to a net-zero carbon grid by developing innovative power generation solutions.Job DescriptionWe are seeking a highly skilled and motivated Fleet Operations Specialist to join our Operations...


  • Menlo Park, California, United States META Full time

    Job SummaryMeta's AI Training and Inference Infrastructure is growing exponentially to support ever-increasing use cases of AI. This results in a dramatic scaling challenge that our engineers have to deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like GPUs together.Key...


  • Menlo Park, California, United States Mainspring Energy, Inc. Full time

    Job Title: Operations and Analytics Engineer - Linear Generator FleetMainspring Energy, Inc. is seeking an experienced Operations and Analytics Engineer to join our team. As a key member of our Operations and Analytics Team, you will be responsible for managing the operations of our rapidly growing Linear Generator fleet, troubleshooting and resolving...


  • Menlo Park, California, United States Mainspring Energy, Inc. Full time

    Job Title: Operations and Analytics Engineer - Linear Generator FleetMainspring Energy, Inc. is seeking an experienced Operations and Analytics Engineer to join our team in Menlo Park. As a key member of our Operations & Analytics Team, you will be responsible for delivering high-quality analytics to the Service, Engineering, and Sales Teams on fleet...


  • Menlo Park, California, United States Diffuse Bio Full time

    Key Responsibilities:Design and develop software and APIs to enable internal and external access to our AI systems.Build tools to automate and maintain computing clusters and data parsing pipelines.Collaborate with our team of researchers to develop cutting-edge AI solutions.Requirements:Bachelor's or Master's degree in Computer Science or a related...

  • Security Engineer

    1 week ago


    Menlo Park, California, United States Character Technologies Full time

    About the RoleWe are seeking a highly skilled Security Engineer to lead our Privacy Engineering efforts at Character Technologies. As a key member of our security team, you will partner closely with cross-functional partners to implement privacy controls for our growing AI platform and build the technology that powers them.This is an exciting opportunity to...


  • Menlo Park, California, United States META Full time

    Job Summary:In this role, you will be a key member of the Network AI Software team, part of the larger DC networking organization at Meta. The team is responsible for developing and owning the software stack around collective communication libraries.The team's primary goal is to enable Meta-wide ML products and innovations to leverage our large-scale...


  • Menlo Park, California, United States OSI Engineering Full time

    Job Overview:We are seeking an experienced Staff/Principal Engineer to lead the development of AI capabilities. As the technical lead, you will focus on architecting and building high-quality front-end solutions while collaborating closely with platform engineers working on the AI infrastructure and senior product managers to create innovative customer...


  • Menlo Park, California, United States META Full time

    Meta Hardware Systems EngineerMeta is seeking a skilled Hardware Systems Engineer to join our Release to Production (RTP) team. As a key member of this team, you will be responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping of experimental HW, pre-production hands-on system and hardware debugging and stress testing,...


  • Menlo Park, California, United States Cyngn Full time

    About CyngnCyngn is a leading autonomous vehicle company based in Menlo Park, CA. We're a collaborative and diverse team that's passionate about innovation and continuous learning.Our self-driving technology can be deployed in various commercial domains across different vehicle form factors. We're seeking experienced leaders to join our team and help move...


  • Menlo Park, California, United States Cyngn Full time

    About CyngnCyngn is a publicly traded autonomous vehicle company based in Menlo Park, CA. We have a culture of collaboration, diversity, and continuous learning. Our self-driving technology can be deployed in various commercial domains across various vehicle form factors.About the RoleWe are seeking a skilled Full Stack Engineer to contribute to the...


  • Menlo Park, California, United States META Full time

    Job SummaryMeta is seeking a highly skilled Hardware Systems Engineer to join our Release to Production (RTP) team. As a key member of this team, you will be responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping, debugging, and stress testing.The RTP team is responsible for ensuring the efficient operation of our...


  • Menlo Park, California, United States META Full time

    Job SummaryThe GenAI Safety alignment team at Meta is seeking a strong leader to mitigate safety concerns of GenAI models and accelerate the world's AI development.Key ResponsibilitiesManage a team of AI engineers and scientists to develop and build new safety alignment methods for Generative AI models.Communicate and collaborate with cross-functional...


  • Menlo Park, California, United States META Full time

    Job Summary:Meta's AI Training and Inference Infrastructure is rapidly expanding to support the increasing use of AI. This growth presents a significant scaling challenge that our engineers must address daily. We need to design and evolve our network infrastructure to connect numerous GPUs together efficiently.To improve performance, we continuously look for...


  • Menlo Park, California, United States OSI Engineering Full time

    Job OverviewWe are seeking an experienced Staff/Principal Engineer to lead the development of AI capabilities. As the technical lead, you will focus on architecting and building high-quality front-end solutions while collaborating closely with platform engineers working on the AI infrastructure as well as senior product managers to create innovative customer...


  • Menlo Park, California, United States OSI Engineering Full time

    Job OverviewWe are seeking an experienced Staff/Principal Engineer to lead the development of AI capabilities. As the technical lead, you will focus on architecting and building high-quality front-end solutions while collaborating closely with platform engineers working on the AI infrastructure as well as senior product managers to create innovative customer...

Systems Engineering Specialist, AI Fleet Operations

2 months ago


Menlo Park, California, United States META Full time

Meta is on the lookout for a Systems Engineering Specialist to become a vital part of our Release to Production (RTP) team. Our infrastructure, comprising servers and data centers, is crucial for the seamless operation of our rapidly expanding services. The RTP team oversees the Hardware Lifecycle of all Meta servers, engaging in hands-on system and hardware debugging, stress testing, and ensuring production-ready system monitoring, automated provisioning, and issue remediation.

Key Responsibilities:

  • Collaborate with external vendors and internal teams, including hardware, mechanical, power, thermal, manufacturing, and software engineers, to grasp system architecture and design effective test suites for diverse architectures.
  • Initiate experiments and develop tools to identify and resolve hardware, firmware, and software health issues proactively.
  • Create a comprehensive test framework for large-scale automation within the fleet during product development and post-mass production.
  • Execute remediation strategies across the software and hardware stack, maintaining meticulous procedural records and data logs.
  • Communicate updates on resolutions and findings internally, while troubleshooting, diagnosing, and identifying root causes of system failures in collaboration with stakeholders.
  • Enhance visibility through data visualization and implement systemic solutions to address hardware health challenges.
  • Facilitate discussions with both external and internal teams regarding test specifications and methodologies to continually enhance test quality.
  • Support Meta's sustainability goals by assessing the carbon footprint of new hardware designs and infrastructure. Collaborate with Net Zero teams to implement strategies for reuse, recycling, energy-efficient computing, and quality practices for both deployed and decommissioned hardware.

Minimum Qualifications:

  • Bachelor's degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience.
  • A minimum of 4 years of experience in hardware system support, with a solid understanding of server architecture and components.
  • Experience in Energy Aware Computing and/or Sustainable Infrastructure Design.
  • Proficiency in Linux and scripting, with experience in modifying system configurations and assessing the impact of changes.
  • Experience working within a matrix organization, engineering various server system/data center products.

Preferred Qualifications:

  • 4+ years of experience in large-scale production support.
  • 4+ years of experience encompassing full system technologies and lifecycle.
  • Experience in supporting AI/HPC systems and related components at scale, particularly in post-production hyperscale environments.

Prepare for Your Journey:
Learn how to prepare for your interview with our comprehensive guide, tips, and interactive experiences.