What is: Bottleneck Attention Module?
Source | BAM: Bottleneck Attention Module |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Park et al. proposed the bottleneck attention module (BAM), aiming to efficiently improve the representational capability of networks. It uses dilated convolution to enlarge the receptive field of the spatial attention sub-module, and build a bottleneck structure as suggested by ResNet to save computational cost.
For a given input feature map , BAM infers the channel attention and spatial attention in two parallel streams, then sums the two attention maps after resizing both branch outputs to . The channel attention branch, like an SE block, applies global average pooling to the feature map to aggregate global information, and then uses an MLP with channel dimensionality reduction. In order to utilize contextual information effectively, the spatial attention branch combines a bottleneck structure and dilated convolutions. Overall, BAM can be written as \begin{align} s_c &= \text{BN}(W_2(W_1\text{GAP}(X)+b_1)+b_2) \end{align}
\begin{align} s_s &= BN(Conv_2^{1 \times 1}(DC_2^{3\times 3}(DC_1^{3 \times 3}(Conv_1^{1 \times 1}(X))))) \end{align} \begin{align} s &= \sigma(\text{Expand}(s_s)+\text{Expand}(s_c)) \end{align} \begin{align} Y &= s X+X \end{align} where , denote weights and biases of fully connected layers respectively, and are convolution layers used for channel reduction. denotes a dilated convolution with kernel, applied to utilize contextual information effectively. expands the attention maps and to .
BAM can emphasize or suppress features in both spatial and channel dimensions, as well as improving the representational power. Dimensional reduction applied to both channel and spatial attention branches enables it to be integrated with any convolutional neural network with little extra computational cost. However, although dilated convolutions enlarge the receptive field effectively, it still fails to capture long-range contextual information as well as encoding cross-domain relationships.