Principal AI Network Architect

3 weeks ago


Idaho City, United States Microsoft Full time

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide and we are looking for passionate engineers to help achieve that mission.As Microsoft's cloud business continues to grow the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the Cloud Hardware Systems Engineering (CHSE) team is instrumental in defining and delivering operational measures of success for hardware manufacturing, improving the planning process, quality, delivery, scale and sustainability related to Microsoft cloud hardware. We are looking for seasoned engineers with a dedicated passion for customer focused solutions, insight and industry knowledge to envision and implement future technical solutions that will manage and optimize the Cloud infrastructure.We are looking for a Principal AI Network Architect to join the team.ResponsibilitiesTechnology Leadership Spearhead architectural definition and innovation for next-generation GPU and AI accelerator platforms, with a focus on ultra-high bandwidth, low-latency backend networks. Drive system-level integration across compute, storage, and interconnect domains to support scalable AI training workloads.Cross-Functional Collaboration Partner with silicon, firmware, and datacenter engineering teams to co-design infrastructure that meets performance, reliability, and deployment goals. Influence platform decisions across rack, chassis, and pod-level implementations.Technology Partnerships Cultivate deep technical relationships with silicon vendors, optics suppliers, and switch fabric providers to co-develop differentiated solutions. Represent Microsoft in joint architecture forums and technical workshops.Architectural Clarity Evaluate and articulate tradeoffs across electrical, mechanical, thermal, and signal integrity domains. Frame decisions in terms of TCO, performance, scalability, and deployment risk. Lead design reviews and contribute to PRDs and system specifications.Industry Influence Shape the direction of hyperscale AI infrastructure by engaging with standards bodies (e.g., IEEE 802.3), influencing component roadmaps, and driving adoption of novel interconnect protocols and topologies.QualificationsRequired Qualifications:Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years technical engineering experienceMaster's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 7+ years technical engineering experienceEquivalent experience5+ years of experience in designing AI backend networks and integrating them into large-scale GPU systems.Other Requirements:Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.Preferred QualificationsProven expertise in system architecture across compute, networking, and accelerator domains.Deep understanding of RDMA protocols (RoCE, InfiniBand), congestion control (DCQCN), and Layer 2/3 routing.Experience with optical interconnects (e.g., PSM, WDM), link budget analysis, and transceiver integration.Familiarity with signal integrity modeling, link training, and physical layer optimization.Experience architecting backend networks for AI training and Inference workloads, including Hamiltonian cycle traffic and collective operations (e.g., all-reduce, all-gather).Hands-on design of high-radix switches (≥400Gbps per port), orthogonal chassis, and cabled backplanes.Knowledge of chip-to-chip and chip-to-module interfaces, including error correction and equalization techniques.Experience with custom NIC IPs and transport layers for secure, reliable packet delivery.Familiarity with AI model execution pipelines and their impact on pod-level network design and latency SLAs.Prior contributions to hyperscale deployments or cloud-scale AI infrastructure programs.Hardware Engineering IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-payMicrosoft will accept applications for the role until November 13, 2025.#SCHIE #azurehwjobs #CHSE #MSCareerEvents25Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations https://careers.microsoft.com/v2/global/en/accessibility.html. #J-18808-Ljbffr



  • Foster City, United States Coupa Full time

    Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter,...


  • Redwood City, United States Snorkel Ai Full time

    About SnorkelAt Snorkel, we believe meaningful AI doesn’t start with the model, it starts with the data.We’re on a mission to help enterprises transform expert knowledge into specialized AI at scale. The AI landscape has gone through incredible changes between 2015, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI...


  • Redwood City, CA, United States Snorkel AI Full time

    About Snorkel At Snorkel, we believe meaningful AI doesn't start with the model, it starts with the data. We're on a mission to help enterprises transform expert knowledge into specialized AI at scale. The AI landscape has gone through incredible changes between 2015, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI...

  • AI Security

    2 weeks ago


    Dakota City, United States Highmark Health Full time

    A leading healthcare organization is seeking a Principal Architect to oversee enterprise-wide data security and protection strategies. This role addresses challenges in data security, especially related to AI and machine learning. The ideal candidate will have extensive experience in information security and be adept at collaborating across departments to...

  • AI Security

    2 weeks ago


    Nevada City, United States Highmark Health Full time

    A leading health organization in California seeks an experienced Data Protection and Security – Principal Architect to oversee enterprise-wide data security and protection strategies. The ideal candidate will have extensive experience in Information Security and a strong understanding of data security challenges presented by AI and ML technologies. This...

  • AI Security

    2 weeks ago


    Oregon City, United States Highmark Health Full time

    A leading health services organization in Oregon City seeks a Principal Architect for Data Protection and Security. This role focuses on defining and implementing data security strategies, mentoring security professionals, and ensuring robust data handling practices. Ideal candidates will have extensive experience in information security, particularly in AI...


  • Jefferson City, United States Highmark Health Full time

    A healthcare organization in Jefferson City seeks a Principal Architect in Data Protection and Security to lead enterprise-wide strategies for data security. You will guide architecture design, mentor teams, and ensure compliance with best practices while leveraging AI technologies. The ideal candidate has extensive experience in information security, data...

  • AI Security

    2 weeks ago


    Arkansas City, United States Highmark Health Full time

    A leading healthcare organization is seeking a Principal Architect to define and implement enterprise-wide data security strategies. The role involves working with stakeholders to embed security practices into operations, mentoring others in the field, and addressing challenges posed by AI and ML technologies. Applicants should have extensive experience in...

  • AI Security

    2 weeks ago


    Arizona City, United States Highmark Health Full time

    A leading healthcare organization in Arizona is seeking a Data Protection and Security – Principal Architect to lead enterprise-wide data security strategies. Responsibilities include defining sustainable architectures, mentoring teams, and ensuring data protection practices, especially in AI and ML technologies. Candidates should have extensive experience...

  • AI Security

    2 weeks ago


    Illinois City, United States Highmark Health Full time

    A leading health organization is seeking a Principal Architect in Data Protection and Security to enhance data security strategies across services. The ideal candidate will have over 10 years of experience in Information Security, focusing on data and asset protection especially concerning AI and ML technologies. Responsibilities include developing secure...