Viet-Anh on Software Logo

What is: Strip Pooling Network?

SourceStrip Pooling: Rethinking Spatial Pooling for Scene Parsing
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Spatial pooling usually operates on a small region which limits its capability to capture long-range dependencies and focus on distant regions. To overcome this, Hou et al. proposed strip pooling, a novel pooling method capable of encoding long-range context in either horizontal or vertical spatial domains.

Strip pooling has two branches for horizontal and vertical strip pooling. The horizontal strip pooling part first pools the input feature FRC×H×WF \in \mathcal{R}^{C \times H \times W} in the horizontal direction: \begin{align} y^1 = \text{GAP}^w (X) \end{align} Then a 1D convolution with kernel size 3 is applied in yy to capture the relationship between different rows and channels. This is repeated WW times to make the output yvy_v consistent with the input shape: \begin{align} y_h = \text{Expand}(\text{Conv1D}(y^1)) \end{align} Vertical strip pooling is performed in a similar way. Finally, the outputs of the two branches are fused using element-wise summation to produce the attention map: \begin{align} s &= \sigma(Conv^{1\times 1}(y_{v} + y_{h})) \end{align} \begin{align} Y &= s X \end{align}

The strip pooling module (SPM) is further developed in the mixed pooling module (MPM). Both consider spatial and channel relationships to overcome the locality of convolutional neural networks. SPNet achieves state-of-the-art results for several complex semantic segmentation benchmarks.