AICurious Logo

What is: Coordinate attention?

SourceCoordinate Attention for Efficient Mobile Network Design
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Hou et al. proposed coordinate attention, a novel attention mechanism which embeds positional information into channel attention, so that the network can focus on large important regions at little computational cost.

The coordinate attention mechanism has two consecutive steps, coordinate information embedding and coordinate attention generation. First, two spatial extents of pooling kernels encode each channel horizontally and vertically. In the second step, a shared 1×11\times 1 convolutional transformation function is applied to the concatenated outputs of the two pooling layers. Then coordinate attention splits the resulting tensor into two separate tensors to yield attention vectors with the same number of channels for horizontal and vertical coordinates of the input XX along. This can be written as \begin{align} z^h &= \text{GAP}^h(X) \end{align} \begin{align} z^w &= \text{GAP}^w(X) \end{align} \begin{align} f &= \delta(\text{BN}(\text{Conv}_1^{1\times 1}([z^h;z^w]))) \end{align} \begin{align} f^h, f^w &= \text{Split}(f) \end{align} \begin{align} s^h &= \sigma(\text{Conv}_h^{1\times 1}(f^h)) \end{align} \begin{align} s^w &= \sigma(\text{Conv}_w^{1\times 1}(f^w)) \end{align} \begin{align} Y &= X s^h s^w \end{align} where GAPh\text{GAP}^h and GAPw\text{GAP}^w denote pooling functions for vertical and horizontal coordinates, and shRC×1×Ws^h \in \mathbb{R}^{C\times 1\times W} and swRC×H×1s^w \in \mathbb{R}^{C\times H\times 1} represent corresponding attention weights.

Using coordinate attention, the network can accurately obtain the position of a targeted object. This approach has a larger receptive field than BAM and CBAM. Like an SE block, it also models cross-channel relationships, effectively enhancing the expressive power of the learned features. Due to its lightweight design and flexibility, it can be easily used in classical building blocks of mobile networks.