AICurious Logo

What is: Channel Attention Module?

SourceCBAM: Convolutional Block Attention Module
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Channel Attention Module is a module for channel-based attention in convolutional neural networks. We produce a channel attention map by exploiting the inter-channel relationship of features. As each channel of a feature map is considered as a feature detector, channel attention focuses on ‘what’ is meaningful given an input image. To compute the channel attention efficiently, we squeeze the spatial dimension of the input feature map.

We first aggregate spatial information of a feature map by using both average-pooling and max-pooling operations, generating two different spatial context descriptors: Fc_avg\mathbf{F}^{c}\_{avg} and Fc_max\mathbf{F}^{c}\_{max}, which denote average-pooled features and max-pooled features respectively.

Both descriptors are then forwarded to a shared network to produce our channel attention map M_cRC×1×1\mathbf{M}\_{c} \in \mathbb{R}^{C\times{1}\times{1}}. Here CC is the number of channels. The shared network is composed of multi-layer perceptron (MLP) with one hidden layer. To reduce parameter overhead, the hidden activation size is set to RC/r×1×1\mathbb{R}^{C/r×1×1}, where rr is the reduction ratio. After the shared network is applied to each descriptor, we merge the output feature vectors using element-wise summation. In short, the channel attention is computed as:

M_c(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F))) \mathbf{M\_{c}}\left(\mathbf{F}\right) = \sigma\left(\text{MLP}\left(\text{AvgPool}\left(\mathbf{F}\right)\right)+\text{MLP}\left(\text{MaxPool}\left(\mathbf{F}\right)\right)\right)

M_c(F)=σ(W_1(W_0(Fc_avg))+W_1(W_0(Fc_max))) \mathbf{M\_{c}}\left(\mathbf{F}\right) = \sigma\left(\mathbf{W\_{1}}\left(\mathbf{W\_{0}}\left(\mathbf{F}^{c}\_{avg}\right)\right) +\mathbf{W\_{1}}\left(\mathbf{W\_{0}}\left(\mathbf{F}^{c}\_{max}\right)\right)\right)

where σ\sigma denotes the sigmoid function, W_0RC/r×C\mathbf{W}\_{0} \in \mathbb{R}^{C/r\times{C}}, and W_1RC×C/r\mathbf{W}\_{1} \in \mathbb{R}^{C\times{C/r}}. Note that the MLP weights, W_0\mathbf{W}\_{0} and W_1\mathbf{W}\_{1}, are shared for both inputs and the ReLU activation function is followed by W_0\mathbf{W}\_{0}.

Note that the channel attention module with just average pooling is the same as the Squeeze-and-Excitation Module.