What is: Attention Gate?

Attention gate focuses on targeted regions while suppressing feature activations in irrelevant regions. Given the input feature map $X$ and the gating signal $G\in \mathbb{R}^{C'\times H\times W}$ which is collected at a coarse scale and contains contextual information, the attention gate uses additive attention to obtain the gating coefficient. Both the input $X$ and the gating signal are first linearly mapped to an $\mathbb{R}^{F\times H\times W}$ dimensional space, and then the output is squeezed in the channel domain to produce a spatial attention weight map $S \in \mathbb{R}^{1\times H\times W}$ . The overall process can be written as \begin{align} S &= \sigma(\varphi(\delta(\phi_x(X)+\phi_g(G)))) \end{align} \begin{align} Y &= S X \end{align} where $\varphi$ , $\phi_x$ and $\phi_g$ are linear transformations implemented as $1\times 1$ convolutions.

The attention gate guides the model's attention to important regions while suppressing feature activation in unrelated areas. It substantially enhances the representational power of the model without a significant increase in computing cost or number of model parameters due to its lightweight design. It is general and modular, making it simple to use in various CNN models.

Source	Attention U-Net: Learning Where to Look for the Pancreas
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com