What is: Strip Pooling Network?

Spatial pooling usually operates on a small region which limits its capability to capture long-range dependencies and focus on distant regions. To overcome this, Hou et al. proposed strip pooling, a novel pooling method capable of encoding long-range context in either horizontal or vertical spatial domains.

Strip pooling has two branches for horizontal and vertical strip pooling. The horizontal strip pooling part first pools the input feature $F \in \mathcal{R}^{C \times H \times W}$ in the horizontal direction: \begin{align} y^1 = \text{GAP}^w (X) \end{align} Then a 1D convolution with kernel size 3 is applied in $y$ to capture the relationship between different rows and channels. This is repeated $W$ times to make the output $y_v$ consistent with the input shape: \begin{align} y_h = \text{Expand}(\text{Conv1D}(y^1)) \end{align} Vertical strip pooling is performed in a similar way. Finally, the outputs of the two branches are fused using element-wise summation to produce the attention map: \begin{align} s &= \sigma(Conv^{1\times 1}(y_{v} + y_{h})) \end{align} \begin{align} Y &= s X \end{align}

The strip pooling module (SPM) is further developed in the mixed pooling module (MPM). Both consider spatial and channel relationships to overcome the locality of convolutional neural networks. SPNet achieves state-of-the-art results for several complex semantic segmentation benchmarks.

Source	Strip Pooling: Rethinking Spatial Pooling for Scene Parsing
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com