What is: Spatial and Channel SE Blocks?

To aggregate global spatial information, an SE block applies global pooling to the feature map. However, it ignores pixel-wise spatial information, which is important in dense prediction tasks. Therefore, Roy et al. proposed spatial and channel SE blocks (scSE). Like BAM, spatial SE blocks are used, complementing SE blocks, to provide spatial attention weights to focus on important regions.

Given the input feature map $X$ , two parallel modules, spatial SE and channel SE, are applied to feature maps to encode spatial and channel information respectively. The channel SE module is an ordinary SE block, while the spatial SE module adopts $1\times 1$ convolution for spatial squeezing. The outputs from the two modules are fused. The overall process can be written as \begin{align} s_c & = \sigma (W_{2} \delta (W_{1}\text{GAP}(X))) \end{align} \begin{align} X_\text{chn} & = s_c X \end{align} \begin{align} s_s &= \sigma(\text{Conv}^{1\times 1}(X)) \end{align} \begin{align} X_\text{spa} & = s_s X \end{align} \begin{align} Y &= f(X_\text{spa},X_\text{chn})
\end{align}

where $f$ denotes the fusion function, which can be maximum, addition, multiplication or concatenation.

The proposed scSE block combines channel and spatial attention to enhance features as well as capturing pixel-wise spatial information. Segmentation tasks are greatly benefited as a result. The integration of an scSE block in F-CNNs makes a consistent improvement in semantic segmentation at negligible extra cost.

Source	Recalibrating Fully Convolutional Networks with Spatial and Channel 'Squeeze & Excitation' Blocks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com