Viet-Anh on Software Logo

What is: Attention-augmented Convolution?

SourceAttention Augmented Convolutional Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Attention-augmented Convolution is a type of convolution with a two-dimensional relative self-attention mechanism that can replace convolutions as a stand-alone computational primitive for image classification. It employs scaled-dot product attention and multi-head attention as with Transformers.

It works by concatenating convolutional and attentional feature map. To see this, consider an original convolution operator with kernel size kk, F_inF\_{in} input filters and F_outF\_{out} output filters. The corresponding attention augmented convolution can be written as"

AAConv(X)=Concat[Conv(X),MHA(X)]\text{AAConv}\left(X\right) = \text{Concat}\left[\text{Conv}(X), \text{MHA}(X)\right]

XX originates from an input tensor of shape (H,W,F_in)\left(H, W, F\_{in}\right). This is flattened to become XRHW×F_inX \in \mathbb{R}^{HW \times F\_{in}} which is passed into a multi-head attention module, as well as a convolution (see above).

Similarly to the convolution, the attention augmented convolution 1) is equivariant to translation and 2) can readily operate on inputs of different spatial dimensions.