AICurious Logo

What is: Spatial and Channel SE Blocks?

SourceRecalibrating Fully Convolutional Networks with Spatial and Channel 'Squeeze & Excitation' Blocks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

To aggregate global spatial information, an SE block applies global pooling to the feature map. However, it ignores pixel-wise spatial information, which is important in dense prediction tasks. Therefore, Roy et al. proposed spatial and channel SE blocks (scSE). Like BAM, spatial SE blocks are used, complementing SE blocks, to provide spatial attention weights to focus on important regions.

Given the input feature map XX, two parallel modules, spatial SE and channel SE, are applied to feature maps to encode spatial and channel information respectively. The channel SE module is an ordinary SE block, while the spatial SE module adopts 1×11\times 1 convolution for spatial squeezing. The outputs from the two modules are fused. The overall process can be written as \begin{align} s_c & = \sigma (W_{2} \delta (W_{1}\text{GAP}(X))) \end{align} \begin{align} X_\text{chn} & = s_c X \end{align} \begin{align} s_s &= \sigma(\text{Conv}^{1\times 1}(X)) \end{align} \begin{align} X_\text{spa} & = s_s X \end{align} \begin{align} Y &= f(X_\text{spa},X_\text{chn})
\end{align}

where ff denotes the fusion function, which can be maximum, addition, multiplication or concatenation.

The proposed scSE block combines channel and spatial attention to enhance features as well as capturing pixel-wise spatial information. Segmentation tasks are greatly benefited as a result. The integration of an scSE block in F-CNNs makes a consistent improvement in semantic segmentation at negligible extra cost.