What is: Re-Attention Module?
Source | DeepViT: Towards Deeper Vision Transformer |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
The Re-Attention Module is an attention layer used in the DeepViT architecture which mixes the attention map with a learnable matrix before multiplying with the values. The motivation is to re-generate the attention maps to increase their diversity at different layers with negligible computation and memory cost. The authors note that traditional self-attention fails to learn effective concepts for representation learning in deeper layers of ViT -- attention maps become more similar and less diverse in deeper layers (attention collapse) - and this hinders the model from getting expected performance gain. Re-attention is implemented by:
where transformation matrix is multiplied to the self-attention map along the head dimension.