Software Engineer, ML Performance
7 days ago
Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep chain-of-thought reasoning.
Software Engineer, ML Performance
Running millions of tokens per second for large models (e.g Llama-3-70B) means running into new performance bottlenecks. Even with hardware optimization for the operations that usually bottleneck us (attention, kernel parallelism), we encounter novel bottlenecks and must design our own solutions to solve them.
You will work closely with our hardware and software teams to identify and mitigate performance bottlenecks, enabling our chips to achieve unprecedented throughput and efficiency. Your work will involve a blend of low-level programming, performance profiling, and hands-on debugging, all aimed at maximizing the performance of our custom-built AI hardware.
You will also play a key role in developing tools and methodologies to help our customers understand the full potential of our hardware.
Representative projects:
- Writing new kernels to improve throughput for LLM embedding
- Improving on PagedAttention to prevent fragmentation of the KV cache in memory
- Debugging hardware issues on a simulated or emulated chip
- Profile transformers running on our hardware, and fix bottlenecks
- Develop ways for customers to work with our chip and understand how their workloads will run on it.
You may be a good fit if you:
- Have 5+ years of low-level programming experience
- Have a strong understanding of data flow and execution paths within embedded systems
- Pick up slack, even if it goes outside your job description
- Are results-oriented, and bias towards shipping products
- Understand SoC and computer system architecture, especially for CPU, interconnect, and memory subsystems
- Want to learn more about machine learning research
We encourage you to apply even if you do not believe you meet every single qualification.
Strong candidates may also have experience with:
- GPU kernel profiling and low-level programming
- Transformer optimizations, such as FlashAttention
- Ongoing research in machine learning
- Palladium emulation
How we're different:
Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.
We are a fully in-person team in Cupertino, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.
Benefits:
- Full medical, dental, and vision packages, with 100% of premium covered, 90% for dependents
- Housing subsidy of $2,000/month for those living within walking distance of the office
- Daily lunch and dinner in our office
- Relocation support for those moving to Cupertino
-
AIML - ML Engineer
1 day ago
Cupertino, California, United States Apple Full timeAs part of Apple's Machine Learning Research organization, we do world-class scientific research and build the technologies that will power future products at Apple. The ML Research Team does world-class research and development across a wide range of domains including understanding and improving ML, addressing bias and fairness in algorithms, privacy and...
-
Cupertino, California, United States Amazon Full time $129,300 - $223,600The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium.The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and GenAI workloads...
-
Camera Software Performance Engineer
3 days ago
Cupertino, California, United States Apple Full time $147,400 - $272,100 per yearApple's Camera Performance team is looking for an engineer who combines a passion for software optimization with a desire to deliver the best image quality alongside the most fluid user experience possible. In this role, you'll be responsible for analyzing and optimizing camera performance across Apple's product range. Building on a strong foundation of...
-
Software Engineer- AI/ML, AWS Neuron
7 days ago
Cupertino, California, United States Amazon Full time $129,300 - $223,600AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators and the Trn1 and Inf1 servers that use them. This role is for a software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron. This role is responsible for development, enablement and performance tuning of a wide...
-
Cupertino, California, United States Amazon Web Services (AWS) Full timeDescriptionThe Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium.The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and GenAI...
-
Software Engineer- AI/ML, AWS Neuron
7 days ago
Cupertino, California, United States myGwork - LGBTQ+ Business Community Full timeThis job is with Amazon, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.DescriptionAWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators and the Trn1 and Inf1 servers that use them....
-
Cupertino, California, United States Apple Full time $150,000 - $250,000 per yearAt Apple, Platform Architecture is responsible for connecting our hardware and software into one unified system. Join this team, and you'll collaborate with engineers across Apple to design how all of our technologies work in unison. In this role, you will be part of the Neural Engine IP architecture team and work to improve the performance of the Neural...
-
AIML - Software Engineer for MLX, MLR
3 days ago
Cupertino, California, United States Apple Full time $120,000 - $240,000 per yearAs part of Apple's Machine Learning Research organization, we do world-class scientific research and build the technologies that will power future products at Apple. The techniques and tools we create will impact ML solutions across Apple, which in turn power most of the features we deliver to billions of consumers worldwide. We are looking for highly...
-
Software Engineer
1 day ago
Cupertino, California, United States Apple Full timeImagine what you could do here. At Apple, we believe new insights have a way of becoming excellent products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don't just create products — they create the kind of wonder that's revolutionized...
-
Software Dev Engineer II
7 days ago
Cupertino, California, United States Amazon Full time $129,300 - $223,600The Product: AWS Machine Learning accelerators are at the forefront of AWS innovation. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium will deliver the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by edge software...