Viet-Anh on Software Logo

What is: Point-wise Spatial Attention?

SourcePSANet: Point-wise Spatial Attention Network for Scene Parsing
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Point-wise Spatial Attention (PSA) is a semantic segmentation module. The goal is capture contextual information, especially in the long range, by aggregating information. Through the PSA module, information aggregation is performed as a kind of information flow where we adaptively learn a pixel-wise global attention map for each position from two perspectives to aggregate contextual information over the entire feature map.

The PSA module takes a spatial feature map X\mathbf{X} as input. We denote the spatial size of X\mathbf{X} as H×WH \times W. Through the two branches as illustrated, we generate pixel-wise global attention maps for each position in feature map X\mathbf{X} through several convolutional layers.

We aggregate input feature maps based on attention maps to generate new feature representations with the long-range contextual information incorporated, i.e., Z_c\mathbf{Z}\_{c} from the ‘collect’ branch and Z_d\mathbf{Z}\_{d} from the ‘distribute’ branch.

We concatenate the new representations Z_c\mathbf{Z}\_{c} and Z_d\mathbf{Z}\_{d} and apply a convolutional layer with batch normalization and activation layers for dimension reduction and feature fusion. Then we concatenate the new global contextual feature with the local representation feature X\mathbf{X}. It is followed by one or several convolutional layers with batch normalization and activation layers to generate the final feature map for following subnetworks.