What is: Coordinate attention?
Source | Coordinate Attention for Efficient Mobile Network Design |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Hou et al. proposed coordinate attention, a novel attention mechanism which embeds positional information into channel attention, so that the network can focus on large important regions at little computational cost.
The coordinate attention mechanism has two consecutive steps, coordinate information embedding and coordinate attention generation. First, two spatial extents of pooling kernels encode each channel horizontally and vertically. In the second step, a shared convolutional transformation function is applied to the concatenated outputs of the two pooling layers. Then coordinate attention splits the resulting tensor into two separate tensors to yield attention vectors with the same number of channels for horizontal and vertical coordinates of the input along. This can be written as \begin{align} z^h &= \text{GAP}^h(X) \end{align} \begin{align} z^w &= \text{GAP}^w(X) \end{align} \begin{align} f &= \delta(\text{BN}(\text{Conv}_1^{1\times 1}([z^h;z^w]))) \end{align} \begin{align} f^h, f^w &= \text{Split}(f) \end{align} \begin{align} s^h &= \sigma(\text{Conv}_h^{1\times 1}(f^h)) \end{align} \begin{align} s^w &= \sigma(\text{Conv}_w^{1\times 1}(f^w)) \end{align} \begin{align} Y &= X s^h s^w \end{align} where and denote pooling functions for vertical and horizontal coordinates, and and represent corresponding attention weights.
Using coordinate attention, the network can accurately obtain the position of a targeted object. This approach has a larger receptive field than BAM and CBAM. Like an SE block, it also models cross-channel relationships, effectively enhancing the expressive power of the learned features. Due to its lightweight design and flexibility, it can be easily used in classical building blocks of mobile networks.