What is: Scale-wise Feature Aggregation Module?
Source | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
SFAM, or Scale-wise Feature Aggregation Module, is a feature extraction block from the M2Det architecture. It aims to aggregate the multi-level multi-scale features generated by Thinned U-Shaped Modules into a multi-level feature pyramid.
The first stage of SFAM is to concatenate features of the equivalent scale together along the channel dimension. The aggregated feature pyramid can be presented as , where refers to the features of the -th largest scale. Here, each scale in the aggregated pyramid contains features from multi-level depths.
However, simple concatenation operations are not adaptive enough. In the second stage, we introduce a channel-wise attention module to encourage features to focus on channels that they benefit most. Following Squeeze-and-Excitation, we use global average pooling to generate channel-wise statistics at the squeeze step. And to fully capture channel-wise dependencies, the following excitation step learns the attention mechanism via two fully connected layers:
where refers to the ReLU function, refers to the sigmoid function, , , r is the reduction ratio ( in our experiments). The final output is obtained by reweighting the input with activation :
where , each of the features is enhanced or weakened by the rescaling operation.