AICurious Logo

What is: All-Attention Layer?

SourceAugmenting Self-attention with Persistent Memory
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

An All-Attention Layer is an attention module and layer for transformers that merges the self-attention and feedforward sublayers into a single unified attention layer. As opposed to the two-step mechanism of the Transformer layer, it directly builds its representation from the context and a persistent memory block without going through a feedforward transformation. The additional persistent memory block stores, in the form of key-value vectors, information that does not depend on the context. In terms of parameters, these persistent key-value vectors replace the feedforward sublayer.