AICurious Logo

What is: Context Enhancement Module?

SourceThunderNet: Towards Real-time Generic Object Detection
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Context Enhancement Module (CEM) is a feature extraction module used in object detection (specifically, ThunderNet) which aims to to enlarge the receptive field. The key idea of CEM is to aggregate multi-scale local context information and global context information to generate more discriminative features. In CEM, the feature maps from three scales are merged: C_4C\_{4}, C_5C\_{5} and C_glbC\_{glb}. C_glbC\_{glb} is the global context feature vector by applying a global average pooling on C_5C\_{5}. We then apply a 1 × 1 convolution on each feature map to squeeze the number of channels to α×p×p=245\alpha \times p \times p = 245.

Afterwards, C_5C\_{5} is upsampled by 2× and C_glbC\_{glb} is broadcast so that the spatial dimensions of the three feature maps are equal. At last, the three generated feature maps are aggregated. By leveraging both local and global context, CEM effectively enlarges the receptive field and refines the representation ability of the thin feature map. Compared with prior FPN structures, CEM involves only two 1×1 convolutions and a fc layer.