Systems Development Engineer III, Annapurna Labs Infrastructure

4 weeks ago


McNeil TX United States Annapurna Labs (U.S.) Inc. Full time
Annapurna Labs, our organization within AWS, is responsible for building innovation in silicon and software for AWS customers. With development centers in the U.S. and Israel, Annapurna is at the forefront of innovation by combining cloud scale with the world’s most talented engineers. Our team covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. Because of our teams’ breadth of talent, we’ve been able to improve AWS cloud infrastructure in networking and security with products such as AWS Nitro, Enhanced Network Adapter (ENA), and Elastic Fabric Adapter (EFA), in compute with AWS Graviton and F1 EC2 Instances, in machine learning with AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe.
As part of Annapurna Labs team, you’ll have the opportunity to invent the next generation of cloud computing infrastructure. You’ll experience what it’s like to work in a fast-paced, innovative, and start-up like environment filled with some of the brightest minds in the industry. The work we do is not only cutting-edge and internet-scale but also deeply important to our customers. We design and build every component of our hardware and software to come together into products that our customers use for accelerated computing: either Machine Learning acceleration, or FPGA acceleration. We get our hands dirty, from creating our own silicon, ensuring our hardware is functional and healthy, and managing the full lifecycle of our systems at the huge scale and complexity of AWS.

If you want a career that makes an impact, allows you to invent, and have first-hand visibility into how your implementations delight customers, then we have a role for you.
If you're interested in being on a team that is "building a complete product" from inception to delighted customers, Annapurna is a fantastic choice.
Join us in creating the most advanced Machine Learning Accelerators in the world

Key job responsibilities
As a technical leader of the Cloud-Scale Machine Learning Acceleration Infrastructure team you’ll be responsible for architecting and leading development of the infrastructure used by our engineering teams. Our customers, the engineering teams, building hardware/software running in our data centers which are custom designed machine learning products: AWS Inferentia2 and Trainium.
You will need to lead across teams to develop and execute in-depth infrastructure development plans that enables the engineering development of the Machine Learning Acceleration product family. You will dive deep to solve critical infrastructure issues involving networking, high performance compute clusters, infrastructure automation of hardware/software/firmware testing, and ASIC/EDA development. You will execute and scale the next generation of cloud infrastructure based on cloud frameworks and AWS services. You will own design reviews for infrastructure development and partner with AWS service teams and vendors. You will influence within your team, your customers and AWS service teams to help drive and develop the technical implementation for overall system designs. You will identify and implement process improvements which improve your team’s agility and operations, including improvements to design, automation, development, test or operations. You will define new mechanisms that execute system health monitoring, diagnostics, repair, and automation. You will develop, document and update operational runbooks as you participate in on-call rotations.

A day in the life
Each day you will work with the best engineers in the industry to develop Machine Learning Accelerators. On-site in Austin, Texas, you will be apart of the team that develops custom silicon and you will own the infrastructure that enables this innovation. Take a look inside our labs to see what you will learn at Annapurna Labs:
https://www.aboutamazon.com/news/aws/take-a-look-inside-the-lab-where-aws-makes-custom-chips
https://youtu.be/rViVFrQg4Hk

We are open to hiring candidates to work out of one of the following locations:

Austin, TX, USA
BASIC QUALIFICATIONS- 5+ years of programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby experience
- 3+ years of non-internship professional software development experience
- 5+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
- 5+ years of deploying and operating in a Linux/Unix environment experience
- 3+ years of systems design, software development, operations, automation, and process improvement experience
- Experience leading the design, build and deployment of complex and performant (reliable and scalable) software solutions in production
- 3+ years of systems development in an IT or data center environment experience
- Experience with debugging complex issues with HW/SW, networking and storage systems
- Experience with operations of large scale infrastructure deployments including improving operational excellence
PREFERRED QUALIFICATIONS- Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations
- Experience taking a leading role in building complex software or computing infrastructure that has been successfully delivered to customers
- Experience writing technical documents, project plans and progress reports to leadership and to stakeholders
- Experience with AWS Cloud Infrastructure deployments using CDK
- Experience with IT security software/tools/standards

Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
  • DATA ENGINEER III

    1 week ago


    , RI, United States FM Global Full time

    FM Global is a leading property insurer of the world's largest businesses, providing more than one-third of FORTUNE 1000-size companies with engineering-based risk management and property insurance solutions. FM Global helps clients maintain continuity in their business operations by drawing upon state-of-the-art loss-prevention engineering and research;...


  • Plano, TX, United States Flexential Full time

    Job Description:Responsibilities for this position include utilizing advanced skills to perform preventative and corrective maintenance of all electrical, mechanical, and HVAC equipment for all data center(s) assigned and maintaining mission-critical uptime of 99.999% or higher. This position is a high-level resource with advanced technical knowledge that...


  • Fort Worth, TX, United States Softworld Inc Full time

    ***Due to the nature of the work being performed US Citizenship is required*** Job Title: Cloud Infrastructure Engineer Job Location: Fort Worth TX 76101 Onsite Requirements: Experience with Azure Cloud Infrastructure Engineering. Perform Risk, Issue and Opportunity (RIO) development and tracking with Digital Enterprise SQL database experience,...


  • United, United States Forhyre Full time

    Job DescriptionJob DescriptionDo you enjoy solving technical issues, empathize with customer user experiences and want to keep up with the latest tech? We are looking for a Cloud Infrastructure Engineer that will work with talented software engineering and support teams to deploy, maintain and ensure reliability of our applications in a fast paced...

  • Data Engineer

    4 weeks ago


    united states Cynet Systems Full time

    Responsibilities:Implement data solutions provided by architects that integrate enterprise streaming data from multiple systems, enabling real-time ingestion & distribution along with real-time data insights.Develop highly scalable, flexible, resilient & cost-efficient data solutions to ingest, process and utilize our data across the enterprise.Collaborate...


  • New York, NY, United States M-Logic Full time

    Role Summary: Our client is looking for a highly skilled Cloud Engineer to join a talented Infrastructure Team. As a Cloud Engineer, you will be responsible for designing, deploying, and maintaining our cloud infrastructure, with a particular focus on Kubernetes & Docker. You will be part of a team responsible for building and maintaining the backbone of our...


  • Sunnyvale, TX, United States Google Full time

    Minimum qualifications:Bachelor's degree or equivalent practical experience. 5 years of experience with software development in one or more programming languages (e.g., Python, C, C++, Java, JavaScript). 5 years of experience in a technical leadership role; overseeing projects, with 5 years of experience in a people management, supervision/team leadership...

  • Research Engineer III

    2 weeks ago


    Bryan, TX, United States Texas A&M University System Offices Full time

    Job TitleResearch Engineer IIIAgencyTexas A&M University System OfficesDepartmentBush Combat Development ComplexProposed Minimum SalaryCommensurateJob LocationBryan, TexasJob TypeStaffJob DescriptionThe System Offices is one of several system members within the Texas A&M University System representing one of the largest systems of higher education in the...


  • Austin, TX, United States Nvidia Full time

    NVIDIA is seeking elite ASIC RTL/Verification ASIC engineers to develop the core Verification and RTL infrastructure of the world's leading GPUs. This position offers the opportunity to have a real impact in a dynamic, technology-focused company impacting product lines ranging from consumer graphics to artificial intelligence to self-driving cars and...


  • Sunnyvale, TX, United States Google Full time

    Minimum qualifications:Bachelor's degree or equivalent practical experience.8 years of experience in software development, low level programming, and with data structures/algorithms in C++ or C.5 years of experience building and developing infrastructure, distributed systems, drivers or networks.Preferred qualifications:Master’s degree or PhD in...


  • Grand Prairie, TX, United States System One Full time

    Position: Cyber Systems Security Engineer (Active DoD Secret Security Clearance Required) Location: Grand Prairie, TX Job Description: System One is seeking a Cyber Systems Security Engineer for an onsite opportunity in Grand Prairie, TX with a large Aerospace and Defense company. A Current, Active Secret Security Clearance is required for consideration. The...

  • Cath Lab RN

    3 weeks ago


    Lake Jackson, TX, United States CHI St.Luke's Health Brazosport Full time

    Cath Lab RNOverview St. Luke's Health-Brazosport Hospital is located on a beautiful 25-acre campus in Lake Jackson Texas. Offering state-of-the-art diagnostic and comprehensive treatment services our hospital is home to a 154-bed patient tower level III trauma center advanced cardiac care center and full-service multidisciplinary cancer center. Our team...

  • Cath Lab RN

    3 weeks ago


    Lake Jackson, TX, United States CHI St.Luke's Health Brazosport Full time

    Cath Lab RNOverview St. Luke's Health-Brazosport Hospital is located on a beautiful 25-acre campus in Lake Jackson Texas. Offering state-of-the-art diagnostic and comprehensive treatment services our hospital is home to a 154-bed patient tower level III trauma center advanced cardiac care center and full-service multidisciplinary cancer center. Our team...

  • Cath Lab RN

    3 weeks ago


    Lake Jackson, TX, United States CHI St.Luke's Health Brazosport Full time

    Cath Lab RNOverview St. Luke's Health-Brazosport Hospital is located on a beautiful 25-acre campus in Lake Jackson Texas. Offering state-of-the-art diagnostic and comprehensive treatment services our hospital is home to a 154-bed patient tower level III trauma center advanced cardiac care center and full-service multidisciplinary cancer center. Our team...


  • Cedar Creek, TX, United States ARM Full time

    Job Overview:Arm is dedicated to empowering the success of our partners through significant investments. This commitment extends to hands-on collaboration to optimize their codebases, enhancing performance on ARM architecture. As Arm's market share expands rapidly, our partners leverage our expertise and strengths to deliver unparalleled value to end...


  • Sunnyvale, TX, United States Google Full time

    Minimum qualifications:Bachelor's degree in Computer Science, a related technical field, or equivalent practical experience.10 years of experience in a technical sales engineer role in a cloud computing environment or customer-facing role.Experience in traditional data center architectures and components (e.g., servers, storage, backup, networking, and...


  • Sunnyvale, TX, United States Google Full time

    Minimum qualifications:Bachelor's degree or equivalent practical experience.8 years of experience with software development in one or more programming languages (e.g., Python, C, C++, Java, JavaScript).3 years of experience building and maintaining machine learning systems and infrastructure for production scale training and serving workloads.3 years of...

  • Transport Engineer

    4 days ago


    Richardson, TX, United States Indotronix International Corporation Full time

    Title:: Transport EngineerWorking Model: OnsiteLocation: Richardson, TXPay Range:: $75-$80/hr on w2Description:As a member of the Systems and Maintenance Engineering (S&ME) Intelligent Edge Network Lab Governance Team, you'll be responsible for creating and executing support strategies for S&ME Lab resources. Primary responsibilities will include:Implement...


  • Frisco, TX, United States Softworld Inc Full time

    ***Due to the nature of the work being performed US Citizenship is required*** Job Title: Cloud Infrastructure SME Location: Frisco TX 75034 Remote live in Texas, but open to outside of TX as well (TX strongly preferred) Onsite Requirements: Implement the Cloud Infrastructure, Operations, DevOps, CI/CD, IaC Strong experience with multi-cloud environment...


  • Sunnyvale, TX, United States Google Full time

    Minimum qualifications:Bachelor's degree or equivalent practical experience. 8 years of experience in software development, and with data structures/algorithms. 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture. 5 years of experience building and developing large-scale...