Software Engineer-AI/ML, AWS Neuron Inference

3 weeks ago


Seattle, United States Amazon Web Services (AWS) Full time

Software Engineer-AI/ML, AWS Neuron InferenceJoin to apply for the Software Engineer-AI/ML, AWS Neuron Inference role at Amazon Web Services (AWS).AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud‑scale machine learning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is responsible for development and performance optimization of core building blocks of LLM Inference – Attention, MLP, Quantization, Speculative Decoding, Mixture of Experts, and more.The team works side by side with chip architects, compiler engineers and runtime engineers to deliver performance and accuracy on Neuron devices across a range of models such as Llama 3.3 70B, 3.1 405B, DBRX, Mixtral, and so on.Key job responsibilities include adapting the latest research in LLM optimization to Neuron chips to extract best performance from both open source as well as internally developed models. Working across teams and organizations is key.About The TeamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge‑sharing and mentorship. Our senior members enjoy one‑on‑one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Basic Qualifications3+ years of non‑internship professional software development experience2+ years of non‑internship design or architecture (design patterns, reliability and scaling) of new and existing systems experienceExperience programming with at least one software programming languageFundamentals of Machine learning models, their architecture, training and inference lifecycles along with work experience on some optimizations for improving the model performance.Preferred Qualifications3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experienceBachelor’s degree in computer science or equivalentHands‑on experience with PyTorch or Jax – preferably involving developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware.Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.Our compensation reflects the cost of labor across several U.S. geographic markets. The base pay for this position ranges from $129,300/year in our lowest geographic market up to $223,600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job‑related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign‑on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.Job ID: A3067624 | Company: Annapurna Labs (U.S.) Inc. #J-18808-Ljbffr



  • Seattle, Washington, United States AWS Neuron Full time

    AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is responsible for development and performance optimization of core building blocks of LLM Inference - Attention, MLP, Quantization,...


  • Seattle, United States Amazon Full time

    Description The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon’s custom machine learning accelerators, Inferentia and Trainium. The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and...

  • Software engineer

    7 days ago


    Seattle, United States Annapurna Labs (U.S.) Inc. Full time

    AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is responsible for development and performance optimization of core building blocks of LLM Inference - Attention, MLP, Quantization,...


  • Seattle, WA, United States Amazon Full time

    Description The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium. The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and...


  • Seattle, United States Amazon.com Services LLC Full time

    The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon’s custom machine learning accelerators, Inferentia and Trainium.The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and GenAI...


  • Seattle, United States Amazon.com Services LLC Full time

    The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon’s custom machine learning accelerators, Inferentia and Trainium.The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and GenAI...


  • Seattle, WA, United States Amazon Full time

    Description The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium. The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and...

  • Software engineer

    1 week ago


    Seattle, United States Amazon Full time

    AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is responsible for development and performance optimization of core building blocks of LLM Inference - Attention, MLP, Quantization,...


  • Seattle, WA, United States Amazon Full time

    The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium. The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and GenAI workloads...


  • Seattle, WA, United States Amazon Full time

    The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium. The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and GenAI workloads...