AI Infrastructure Quality Engineer

2 weeks ago


Sunnyvale, California, United States Cerebras Full time

Cerebras Systems is at the forefront of innovation in AI computing, having developed a revolutionary chip and system that transforms deep learning applications. Our technology enables machine learning researchers to achieve remarkable speeds in both training and inference tasks, driving AI advancements to unprecedented levels.

The recently introduced Condor Galaxy 1 (CG-1) exemplifies our dedication to enhancing AI computing capabilities. With an extraordinary processing power of 4 ExaFLOPS, 54 million cores, and a 64-node architecture, the CG-1 is the first of nine powerful supercomputers that will be constructed and operated through a strategic alliance between Cerebras and G42. This collaboration aims to redefine AI potential by establishing a network of interconnected supercomputers that will collectively deliver an astounding 36 ExaFLOPS of AI compute power upon completion.

Role Overview
  • Assess and recommend Data Center equipment, including Switches, Routers, Servers, NICs, and Transceivers, focusing on performance enhancement and cost efficiency.
Key Responsibilities
  • Identify and implement experiments, tools, and methodologies to evaluate complex AI Infrastructure equipment, including Switches, Routers, Servers, NICs, and Transceivers, pushing the limits of hardware design and system integration.
  • Collaborate with equipment vendors to assess the performance of newly launched hardware and address any defects.
  • Design and establish test labs and test beds to rigorously evaluate vendor equipment from leading companies.
  • Work alongside architects and software engineers to develop test cases, create test scripts, execute tests, and document evaluation results from various vendors.
  • Troubleshoot and resolve issues in partnership with other teams and vendors.
  • Provide solutions for effective networking design tailored for AI infrastructure.
  • Design, install, configure, and maintain complex networks for AI applications.
  • Develop and optimize server system benchmarks, leveraging a deep understanding of server architecture and workload characterization.
Qualifications and Skills Required
  • 15+ years of experience in Software Development, Quality Assurance, and System Testing of Switches and Routers within a networking equipment vendor.
  • Bachelor's degree or higher in Electrical Engineering, Computer Engineering, Computer Science, or related fields.
  • Strong understanding of RDMA congestion control mechanisms on InfiniBand and RoCE Networks.
  • In-depth knowledge of networking protocols such as BGP, PFC, ECN, QoS, MLAG, ECMP, and VRF.
  • Experience with computer system architecture, particularly in CPU SoC or Platform Architecture, Interconnect Fabric, and Memory subsystems.
  • Proven experience in designing and implementing large-scale switching and routing networks.
  • Exceptional technical abilities, problem-solving skills, design expertise, coding, and debugging proficiency.
  • Expertise in Linux tools, including lspci, ping, traceroute, tcpdump, ifconfig, ip link, ip route, arp, and others.
  • Proficiency in Python programming.
  • Familiarity with Networking Test Tools such as IXIA and Smartbits.
Why Choose Cerebras

At Cerebras, we believe that those serious about software should also be involved in hardware development. Our groundbreaking architecture is unlocking new possibilities within the AI industry. With numerous model releases and rapid growth, we are at a pivotal moment in our business. Our team members highlight five main reasons for their commitment to Cerebras:

  • Develop a revolutionary AI platform that transcends GPU limitations.
  • Publish and open-source cutting-edge AI research.
  • Contribute to one of the fastest AI supercomputers globally.
  • Experience job stability combined with the dynamism of a startup.
  • Thrive in a straightforward, non-corporate work culture that respects individual beliefs.

Cerebras Systems is dedicated to fostering an equal and diverse workplace and is proud to be an equal opportunity employer. We celebrate diverse backgrounds, perspectives, and skills, believing that inclusive teams create superior products and companies. We strive daily to cultivate an environment that empowers individuals to excel through continuous learning, growth, and mutual support.



  • Sunnyvale, California, United States Cerebras Full time

    Cerebras Systems has developed an innovative chip and system that transforms deep learning applications. Our technology enables machine learning researchers to achieve remarkable speeds in both training and inference tasks, driving AI advancements to unprecedented levels.The recently announced Condor Galaxy 1 (CG-1) exemplifies Cerebras' dedication to...


  • Sunnyvale, California, United States Cerebras Full time

    Cerebras Systems has revolutionized the landscape of deep learning with its innovative chip and system, enabling machine learning researchers to achieve remarkable speeds in both training and inference tasks, thus driving AI advancements to unprecedented levels.The recently introduced Condor Galaxy 1 (CG-1) exemplifies Cerebras' dedication to advancing AI...


  • Sunnyvale, California, United States AI Technologies LLC. Full time

    Job OverviewJob ID: ConfidentialSpecialized Area: Advanced AnalyticsJob Title: Machine Learning EngineerCompany: AI Technologies LLC.Duration: 6 MonthsTransforms business requirements into actionable machine learning strategiesDevelops and implements scalable machine learning solutions to drive business growthCollaborates with cross-functional teams to...


  • Sunnyvale, California, United States Cerebras Full time

    Cerebras Systems is at the forefront of AI innovation, having developed a revolutionary chip and system that transforms deep learning applications. Our technology enables machine learning researchers to achieve remarkable speeds in both training and inference tasks, driving forward the evolution of artificial intelligence.The recently introduced Condor...


  • Sunnyvale, California, United States Altimate Full time

    Job OverviewWe are seeking a highly skilled Backend Engineer to join our DataPilot team at Altimate AI. As a key member of our engineering team, you will be responsible for designing and developing the backend infrastructure that powers our state-of-the-art Large Language Model (LLM) architectures.Key ResponsibilitiesDesign and develop large-scale...


  • Sunnyvale, California, United States Infobahn SoftWorld Inc Full time

    Job Description**Job Title:** Prompt Engineer - AI Innovation**Job Summary:** We are seeking a skilled Prompt Engineer to join our team at Infobahn SoftWorld Inc. as a key member of our Converse Platform Team. The successful candidate will be responsible for designing, evaluating, and improving our conversational AI capabilities.Key Responsibilities:Develop...


  • Sunnyvale, California, United States TuSimple Full time

    TuSimple is a pioneering global technology firm specializing in autonomous driving, with its headquarters located in San Diego, California. Established in 2015, TuSimple is on a mission to create a fully autonomous (SAE Level 4) driving solution tailored for long-haul heavy-duty trucking. The company is dedicated to revolutionizing the $4 trillion global...


  • Sunnyvale, California, United States Infobahn SoftWorld Inc Full time

    Job Description**Job Title:** Prompt Engineer - AI Innovation**Job Summary:** We are seeking a skilled Prompt Engineer to join our team at Infobahn SoftWorld Inc. as a key member of our Converse Platform Team. The successful candidate will be responsible for designing, evaluating, and improving our conversational AI capabilities.Key Responsibilities:Develop...


  • Sunnyvale, California, United States Chemix Inc. Full time

    Job Description**About Chemix Inc.**Chemix Inc. is a pioneering company in the field of autonomous battery materials discovery and optimization. We are seeking a highly motivated and skilled software engineer to join our team and contribute to our mission of developing better batteries for sustainable energy.**Job Summary**We are looking for a talented...


  • Sunnyvale, California, United States Google Cloud - Minnesota Full time

    About the RoleAs a Software Engineering Manager at Google Cloud - Minnesota, you will be responsible for leading research explorations and applied AI efforts to develop Generative AI in partnership with Google DeepMind. This involves transforming software development workflows at Google through AI-assisted coding, debugging, testing, and chat agents.Key...


  • Sunnyvale, California, United States Capgemini Engineering Full time

    Position: Infrastructure Systems EngineerLocation: Hybrid Work EnvironmentEmployment Type: Full-TimeRole Overview:We are seeking a skilled and knowledgeable Infrastructure Systems Engineer to become a vital part of our IT Infrastructure team. The ideal candidate will possess a robust foundation in SQL, Unix shell scripting, and Python programming. As a...


  • Sunnyvale, California, United States Amazon Full time

    The AGI information organization is dedicated to making global information accessible for AI models and customers across various platforms. Within AGI Information, the Information Experience Technology (IXT) team is tasked with crafting and delivering engaging, intuitive conversational experiences, enriched with relevant content tailored to each customer and...


  • Sunnyvale, California, United States Apple Full time

    Position OverviewAs a Lead Software Engineer specializing in Generative AI Solutions, you will be an integral part of a dynamic team focused on enhancing Apple's enterprise capabilities through advanced machine learning technologies.LocationSunnyvale, California, United StatesAbout the TeamThe Generative AI Solutions team is dedicated to driving innovation...


  • Sunnyvale, California, United States Figure Full time

    Figure is an AI Robotics company developing a general purpose humanoid. Our Humanoid is designed for corporate tasks targeting labor shortages and jobs that are undesirable or unsafe. We are based in Sunnyvale, CA and require 5 days/week in-office collaboration. Figure's vision is to deploy autonomous humanoids at a global scale. Our AI team is looking for...


  • Sunnyvale, California, United States Illumio Full time

    About the RoleWe are seeking a highly skilled Senior Machine Learning Engineer to join our team at Illumio, a pioneer and market leader in Zero Trust segmentation. As a key member of our Machine Learning team, you will play a critical role in building a pioneering product focused on solving cybersecurity issues faced by businesses of all scales.Key...


  • Sunnyvale, California, United States AppLab Systems, Inc Full time

    About the RoleWe are seeking a highly skilled Machine Learning Engineer to join our team at AppLab Systems, Inc. as an AI Vision Expert. This is an exciting opportunity to work on cutting-edge projects and contribute to the development of innovative AI solutions.Key ResponsibilitiesDevelop and Optimize AI Models: Design, implement, and refine machine...


  • Sunnyvale, California, United States Links Technology Solutions Inc Full time

    Links Technology Solutions Inc is seeking a skilled and seasoned Lead AI Solutions Engineer to become a vital part of our client's organization. In this role, you will be instrumental in pioneering a new business segment for a Department of Defense initiative. Your leadership will be key in guiding a team of engineers to create innovative artificial...


  • Sunnyvale, California, United States Google Full time

    About the RoleThe Applied Machine Learning Organization at Core Machine Learning is seeking a highly experienced Principal Engineer to lead the technical design and development of our Generative AI tuning and optimization platform. As a key member of our team, you will be responsible for driving the long-term design and experience of our tuning and...


  • Sunnyvale, California, United States Maxonic Full time

    Maxonic values its enduring partnerships with clients. To meet their evolving requirements, we are seeking a MacOS Infrastructure Engineer. Position Overview: Job Title: MacOS Infrastructure EngineerWork Arrangement: OnsiteContract Type: Long TermCompensation: $50/hr to $70/hr on W2 Role Responsibilities:As a vital member of our engineering team, your...


  • Sunnyvale, California, United States Apple Full time

    Position: 3D Vision and AI Solutions EngineerLocation: Sunnyvale, California, United StatesField: Machine Learning and Artificial IntelligenceAs a 3D Vision and AI Solutions Engineer, you will be at the forefront of developing advanced algorithms that enhance augmented and virtual reality experiences. Your work will be pivotal in creating products that not...