What is: Strip Pooling Network?
Source | Strip Pooling: Rethinking Spatial Pooling for Scene Parsing |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Spatial pooling usually operates on a small region which limits its capability to capture long-range dependencies and focus on distant regions. To overcome this, Hou et al. proposed strip pooling, a novel pooling method capable of encoding long-range context in either horizontal or vertical spatial domains.
Strip pooling has two branches for horizontal and vertical strip pooling. The horizontal strip pooling part first pools the input feature in the horizontal direction: \begin{align} y^1 = \text{GAP}^w (X) \end{align} Then a 1D convolution with kernel size 3 is applied in to capture the relationship between different rows and channels. This is repeated times to make the output consistent with the input shape: \begin{align} y_h = \text{Expand}(\text{Conv1D}(y^1)) \end{align} Vertical strip pooling is performed in a similar way. Finally, the outputs of the two branches are fused using element-wise summation to produce the attention map: \begin{align} s &= \sigma(Conv^{1\times 1}(y_{v} + y_{h})) \end{align} \begin{align} Y &= s X \end{align}
The strip pooling module (SPM) is further developed in the mixed pooling module (MPM). Both consider spatial and channel relationships to overcome the locality of convolutional neural networks. SPNet achieves state-of-the-art results for several complex semantic segmentation benchmarks.