Software Engineer, ML Performance

7 days ago

Cupertino, California, United States OpenReq Full time $120,000 - $240,000 per year

About Etched

Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep chain-of-thought reasoning.

Running millions of tokens per second for large models (e.g Llama-3-70B) means running into new performance bottlenecks. Even with hardware optimization for the operations that usually bottleneck us (attention, kernel parallelism), we encounter novel bottlenecks and must design our own solutions to solve them.

You will work closely with our hardware and software teams to identify and mitigate performance bottlenecks, enabling our chips to achieve unprecedented throughput and efficiency. Your work will involve a blend of low-level programming, performance profiling, and hands-on debugging, all aimed at maximizing the performance of our custom-built AI hardware.

You will also play a key role in developing tools and methodologies to help our customers understand the full potential of our hardware.

Representative projects:

Writing new kernels to improve throughput for LLM embedding
Improving on PagedAttention to prevent fragmentation of the KV cache in memory
Debugging hardware issues on a simulated or emulated chip
Profile transformers running on our hardware, and fix bottlenecks
Develop ways for customers to work with our chip and understand how their workloads will run on it.

You may be a good fit if you:

Have 5+ years of low-level programming experience
Have a strong understanding of data flow and execution paths within embedded systems
Pick up slack, even if it goes outside your job description
Are results-oriented, and bias towards shipping products
Understand SoC and computer system architecture, especially for CPU, interconnect, and memory subsystems
Want to learn more about machine learning research

We encourage you to apply even if you do not believe you meet every single qualification.

Strong candidates may also have experience with:

GPU kernel profiling and low-level programming
Transformer optimizations, such as FlashAttention
Ongoing research in machine learning
Palladium emulation

How we're different:

Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

We are a fully in-person team in Cupertino, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.

Benefits:

Full medical, dental, and vision packages, with 100% of premium covered, 90% for dependents
Housing subsidy of $2,000/month for those living within walking distance of the office
Daily lunch and dinner in our office
Relocation support for those moving to Cupertino

AIML - ML Engineer

1 day ago

Cupertino, California, United States Apple Full time

As part of Apple's Machine Learning Research organization, we do world-class scientific research and build the technologies that will power future products at Apple. The ML Research Team does world-class research and development across a wide range of domains including understanding and improving ML, addressing bias and fairness in algorithms, privacy and...
Software Development Engineer, AI/ML, AWS Neuron, Model Inference

3 days ago

Cupertino, California, United States Amazon Full time $129,300 - $223,600

The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium.The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and GenAI workloads...
Camera Software Performance Engineer

3 days ago

Cupertino, California, United States Apple Full time $147,400 - $272,100 per year

Apple's Camera Performance team is looking for an engineer who combines a passion for software optimization with a desire to deliver the best image quality alongside the most fluid user experience possible. In this role, you'll be responsible for analyzing and optimizing camera performance across Apple's product range. Building on a strong foundation of...
Software Engineer- AI/ML, AWS Neuron

7 days ago

Cupertino, California, United States Amazon Full time $129,300 - $223,600

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators and the Trn1 and Inf1 servers that use them. This role is for a software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron. This role is responsible for development, enablement and performance tuning of a wide...
Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

21 hours ago

Cupertino, California, United States Amazon Web Services (AWS) Full time

DescriptionThe Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium.The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and GenAI...
Software Engineer- AI/ML, AWS Neuron

7 days ago

Cupertino, California, United States myGwork - LGBTQ+ Business Community Full time

This job is with Amazon, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.DescriptionAWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators and the Trn1 and Inf1 servers that use them....
Neural Engine Performance Architect, Platform Architecture

7 days ago

Cupertino, California, United States Apple Full time $150,000 - $250,000 per year

At Apple, Platform Architecture is responsible for connecting our hardware and software into one unified system. Join this team, and you'll collaborate with engineers across Apple to design how all of our technologies work in unison. In this role, you will be part of the Neural Engine IP architecture team and work to improve the performance of the Neural...
AIML - Software Engineer for MLX, MLR

3 days ago

Cupertino, California, United States Apple Full time $120,000 - $240,000 per year

As part of Apple's Machine Learning Research organization, we do world-class scientific research and build the technologies that will power future products at Apple. The techniques and tools we create will impact ML solutions across Apple, which in turn power most of the features we deliver to billions of consumers worldwide. We are looking for highly...
Software Engineer

1 day ago

Cupertino, California, United States Apple Full time

Imagine what you could do here. At Apple, we believe new insights have a way of becoming excellent products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don't just create products — they create the kind of wonder that's revolutionized...
Software Dev Engineer II

7 days ago

Cupertino, California, United States Amazon Full time $129,300 - $223,600

The Product: AWS Machine Learning accelerators are at the forefront of AWS innovation. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium will deliver the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by edge software...

Americas

Europe

Asia / Oceania

Africa

Software Engineer, ML Performance