HPC/ML Networking Systems Architect

3 weeks ago


Cupertino, California, United States Amazon Full time
Company Overview
Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuously innovating to power businesses of startups to Global 500 companies. Our team focuses on building networking solutions for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS.

Salary
The estimated salary range for this position is between $151,300/year and $261,500/year, depending on the location and job-related knowledge, skills, and experience. Amazon offers a total compensation package including equity, sign-on payments, and other forms of compensation in addition to a full range of medical, financial, and/or other benefits.

Job Description
We are seeking an experienced software engineer with low-level latency networking or interconnect expertise to optimize customer experience by designing systems that enable scaling network-intensive workloads over thousands of CPUs, GPUs, and TPUs. This role is on the forefront of AI/ML, spending a good deal of the day optimizing the networking for the latest AI workload such as LLMs. The ideal candidate will have extensive experience in low-latency networking and collective operations, such as HPC network fabric or machine learning accelerator cluster systems. Experience in high-frequency trading networking, high-speed wireless networking, or low latency interconnects such as PCIe or CXL is also applicable. Proficiency in C/C++ and a deep understanding of Linux and kernel-level programming are essential. Strong problem-solving skills and the ability to troubleshoot complex networking issues are required, along with excellent communication skills to work effectively in a collaborative team environment.

Required Skills and Qualifications
5+ years of non-internship professional software development experience; 5+ years of programming with at least one software programming language experience; 5+ years of leading design or architecture of new and existing systems; 5+ years of full software development life cycle experience including coding standards, code reviews, source control management, build processes, testing, and operations. Experience as a mentor, tech lead, or leading an engineering team is preferred. A Bachelor's degree in computer science or equivalent is also preferred.

Benefits
AWS values diverse experiences and encourages candidates from all backgrounds to apply. As an employee-led affinity group, we foster a culture of inclusion that empowers us to be proud of our differences. Ongoing events and learning experiences inspire us to never stop embracing our uniqueness. We strive for flexibility as part of our working culture, valuing work-life harmony and providing endless knowledge-sharing, mentorship, and career-advancing resources.

  • Cupertino, California, United States Amazon Full time

    About the RoleAs a Cloud Infrastructure Engineer - HPC/ML, you will be responsible for designing and optimizing networking solutions for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS. You will collaborate with cross-functional teams and engage with customers to gather feedback and continuously improve our offerings. Your day...


  • Cupertino, California, United States Amazon Full time

    Job Description:We are seeking an experienced software engineer to design and optimize systems for low-latency networking and collective operations. This role involves working with cross-functional teams to develop innovative solutions for customers who require specialized security solutions for their cloud services.The ideal candidate will have extensive...


  • Cupertino, California, United States Amazon Full time

    **Our Ideal Candidate**Our ideal candidate will have extensive experience in low-latency networking and collective operations, such as HPC network fabric or machine learning accelerator cluster systems. Also applicable is experience high-frequency trading networking, high-speed wireless networking, or low latency interconnects such as PCIe or CXL....


  • Cupertino, California, United States Amazon Full time

    Key Responsibilities* Designing and optimizing networking solutions for ML and HPC workloads* Collaborating with cross-functional teams to deliver scalable and reliable systems* Engaging with customers to gather feedback and improve our offerings* Participating in innovative learning experiences and benefiting from employee-led affinity groups that foster a...


  • Cupertino, California, United States Amazon Full time

    Unlock the Future of AI and HPC with AmazonAnnapurna Labs, a pioneering organization within AWS, is seeking an experienced Software Development Engineer to join our elite team. As a High-Performance Networking Solutions Engineer, you will design and optimize cutting-edge systems for Machine Learning (ML) and High-Performance Computing (HPC) workloads on...


  • Cupertino, California, United States Amazon Full time

    **About the Role**We are seeking an experienced software engineer with low-level latency networking or interconnect expertise to optimize customer experience by designing systems that enable scaling network-intensive workloads over thousands of CPUs, GPUs, and TPUs. This role is on the forefront of AI/ML, we spend a good deal of the day optimizing the...


  • Cupertino, California, United States Amazon Full time

    A Day in the LifeYou will spend your days designing and optimizing networking solutions, collaborating with experienced engineers, and engaging with customers to gather feedback and continually improve our offerings. This role offers a unique opportunity to contribute to the development of cutting-edge systems that cater to the needs of our customers in the...


  • Cupertino, California, United States Apple Full time

    About This OpportunityThis is an exciting opportunity to join Apple's Ecosystem Tools team as a Hardware ML Solutions Architect. As a member of this team, you'll work closely with cross-functional partners to develop software that supports the development of Apple's hardware product line. Your primary responsibilities will include writing high-quality code,...


  • Cupertino, California, United States Apple Full time

    Overview:Cupertino, California-based Apple is seeking a highly skilled Data and ML Solutions Architect to join its AI and ML Engineering team. This is an exciting opportunity to design and develop innovative data solutions that empower data scientists, machine learning engineers, and researchers to create transformative products. As a key member of this...


  • Cupertino, California, United States CEREBRAS SYSTEMS INC. Full time

    Job Title: AI Infrastructure Network EngineerCerebras Systems is seeking an experienced AI Infrastructure Network Engineer to join our team.Key Responsibilities:Manage and optimize end-to-end network performance of complex AI infrastructure, including servers and switches.Evaluate and recommend servers, switches, and routers for next-generation...


  • Cupertino, California, United States Amazon Full time

    About the RoleWe are looking for a Cloud Automation Expert to lead the development of highly automated CI/CD pipelines and cluster ML or HPC applications. Your expertise in Python, TypeScript, CDK, and Linux will enable you to design and architect new and existing systems for high performance and scalability.About the TeamYou will join a dedicated team of...


  • Cupertino, California, United States Amazon Full time

    About the Job\">We are seeking a highly skilled Deep Learning System Architect to join our team at Amazon.\">In this role, you will be responsible for designing and implementing business-critical features, publishing cutting-edge research, and contributing to a brilliant team of experienced engineers.\">">You will leverage your technical communications skill...


  • Cupertino, California, United States Amazon Full time

    Key ResponsibilitiesAs a Cloud Network Architect, you will design and implement Linux-based solutions on embedded devices for networking products. You will partner with network engineering, software, and hardware teams to develop reliable networking devices that are the building blocks of the Amazon network.Your expertise in programming languages such as...


  • Cupertino, California, United States Apple Full time

    **Job Description:**We are seeking a highly skilled Senior AI/ML Performance Engineering Specialist to join our team at Apple. This role will be responsible for designing and implementing large-scale data and compute-intensive frameworks/APIs, as well as tools and infrastructure for ML-based product qualification.The ideal candidate will have 5+ years of...


  • Cupertino, California, United States Apple Full time

    At Apple, we're pushing the boundaries of machine learning and artificial intelligence to create groundbreaking technology for large-scale systems, natural language, and AI. If you're passionate about expanding the experience of Siri and other AIML products to new platforms, join our ML Systems Evaluation Engineering team and contribute to a highly...


  • Cupertino, California, United States Apple Full time

    Job DescriptionAs a Performance Framework Software Engineer at Apple, you will be responsible for designing and implementing automation and performance frameworks for evaluating scalable performance measurements of machine learning-based products. Your expertise in Swift, Python, and XC test will be utilized to develop tools, APIs, and infrastructure for...


  • Cupertino, California, United States Amazon Full time

    About the Job DescriptionThis role involves developing a compiler to handle the world's largest ML workloads, architecting and implementing business-critical features, publishing cutting-edge research, and contributing to a brilliant team of experienced engineers. As an AI Accelerator Software Engineer, you will leverage your technical communications skills...


  • Cupertino, California, United States Apple Full time

    **Job Overview**We are seeking a seasoned Senior Software Engineer with expertise in designing and building cloud-native infrastructure platforms at Apple scale. As a key member of the Machine Learning Platform Team, you will be responsible for architecting and developing scalable cloud-native platforms to support the deployment and operation of Apple's...


  • Cupertino, California, United States Apple Full time

    **About Apple Inc.**We're a company that's proud of our heritage and our reputation for innovation. Our team is dedicated to pushing the boundaries of what's possible, and we're looking for talented individuals like you to join us.**The Role**This is an exciting opportunity to work on cutting-edge sensing technologies as part of our Camera Incubation team....


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Sr Software Engineer to join our team at Apple Intelligence, focusing on Machine Learning and Generative AI. As a key member of the ML Systems Evaluation Engineering (MLSEE) team, you will play a vital role in evaluating AIML products, driving innovation, and making significant contributions to the development of...