What is: All-Attention Layer?
Source | Augmenting Self-attention with Persistent Memory |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
An All-Attention Layer is an attention module and layer for transformers that merges the self-attention and feedforward sublayers into a single unified attention layer. As opposed to the two-step mechanism of the Transformer layer, it directly builds its representation from the context and a persistent memory block without going through a feedforward transformation. The additional persistent memory block stores, in the form of key-value vectors, information that does not depend on the context. In terms of parameters, these persistent key-value vectors replace the feedforward sublayer.