AICurious Logo

What is: Attention Gate?

SourceAttention U-Net: Learning Where to Look for the Pancreas
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Attention gate focuses on targeted regions while suppressing feature activations in irrelevant regions. Given the input feature map XX and the gating signal GRC×H×WG\in \mathbb{R}^{C'\times H\times W} which is collected at a coarse scale and contains contextual information, the attention gate uses additive attention to obtain the gating coefficient. Both the input XX and the gating signal are first linearly mapped to an RF×H×W\mathbb{R}^{F\times H\times W} dimensional space, and then the output is squeezed in the channel domain to produce a spatial attention weight map SR1×H×W S \in \mathbb{R}^{1\times H\times W}. The overall process can be written as \begin{align} S &= \sigma(\varphi(\delta(\phi_x(X)+\phi_g(G)))) \end{align} \begin{align} Y &= S X \end{align} where φ\varphi, ϕx\phi_x and ϕg\phi_g are linear transformations implemented as 1×11\times 1 convolutions.

The attention gate guides the model's attention to important regions while suppressing feature activation in unrelated areas. It substantially enhances the representational power of the model without a significant increase in computing cost or number of model parameters due to its lightweight design. It is general and modular, making it simple to use in various CNN models.