What is: Context Enhancement Module?

Context Enhancement Module (CEM) is a feature extraction module used in object detection (specifically, ThunderNet) which aims to to enlarge the receptive field. The key idea of CEM is to aggregate multi-scale local context information and global context information to generate more discriminative features. In CEM, the feature maps from three scales are merged: $C\_{4}$ , $C\_{5}$ and $C\_{glb}$ . $C\_{glb}$ is the global context feature vector by applying a global average pooling on $C\_{5}$ . We then apply a 1 × 1 convolution on each feature map to squeeze the number of channels to $\alpha \times p \times p = 245$ .

Afterwards, $C\_{5}$ is upsampled by 2× and $C\_{glb}$ is broadcast so that the spatial dimensions of the three feature maps are equal. At last, the three generated feature maps are aggregated. By leveraging both local and global context, CEM effectively enlarges the receptive field and refines the representation ability of the thin feature map. Compared with prior FPN structures, CEM involves only two 1×1 convolutions and a fc layer.

Source	ThunderNet: Towards Real-time Generic Object Detection
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Context Enhancement Module?

Viet-Anh on Software