What is: Point-wise Spatial Attention?

Point-wise Spatial Attention (PSA) is a semantic segmentation module. The goal is capture contextual information, especially in the long range, by aggregating information. Through the PSA module, information aggregation is performed as a kind of information flow where we adaptively learn a pixel-wise global attention map for each position from two perspectives to aggregate contextual information over the entire feature map.

The PSA module takes a spatial feature map $\mathbf{X}$ as input. We denote the spatial size of $\mathbf{X}$ as $H \times W$ . Through the two branches as illustrated, we generate pixel-wise global attention maps for each position in feature map $\mathbf{X}$ through several convolutional layers.

We aggregate input feature maps based on attention maps to generate new feature representations with the long-range contextual information incorporated, i.e., $\mathbf{Z}\_{c}$ from the ‘collect’ branch and $\mathbf{Z}\_{d}$ from the ‘distribute’ branch.

We concatenate the new representations $\mathbf{Z}\_{c}$ and $\mathbf{Z}\_{d}$ and apply a convolutional layer with batch normalization and activation layers for dimension reduction and feature fusion. Then we concatenate the new global contextual feature with the local representation feature $\mathbf{X}$ . It is followed by one or several convolutional layers with batch normalization and activation layers to generate the final feature map for following subnetworks.

Source	PSANet: Point-wise Spatial Attention Network for Scene Parsing
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com