AICurious Logo

What is: Balanced Feature Pyramid?

SourceLibra R-CNN: Towards Balanced Learning for Object Detection
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Balanced Feature Pyramid is a feature pyramid module. It differs from approaches like FPNs that integrate multi-level features using lateral connections. Instead the BFP strengthens the multi-level features using the same deeply integrated balanced semantic features. The pipeline is shown in the Figure to the right. It consists of four steps, rescaling, integrating, refining and strengthening.

Features at resolution level ll are denoted as C_lC\_{l}. The number of multi-level features is denoted as LL. The indexes of involved lowest and highest levels are denoted as l_minl\_{min} and l_maxl\_{max}. In the Figure, C_2C\_{2} has the highest resolution. To integrate multi-level features and preserve their semantic hierarchy at the same time, we first resize the multi-level features {C_2,C_3,C_4,C_5C\_{2}, C\_{3}, C\_{4}, C\_{5}} to an intermediate size, i.e., the same size as C_4C\_{4}, with interpolation and max-pooling respectively. Once the features are rescaled, the balanced semantic features are obtained by simple averaging as:

C=1Ll_max_l=l_minC_lC = \frac{1}{L}\sum^{l\_{max}}\_{l=l\_{min}}C\_{l}

The obtained features are then rescaled using the same but reverse procedure to strengthen the original features. Each resolution obtains equal information from others in this procedure. Note that this procedure does not contain any parameter. The authors observe improvement with this nonparametric method, proving the effectiveness of the information flow.

The balanced semantic features can be further refined to be more discriminative. The authors found both the refinements with convolutions directly and the non-local module work well. But the non-local module works in a more stable way. Therefore, embedded Gaussian non-local attention is utilized as default. The refining step helps us enhance the integrated features and further improve the results.

With this method, features from low-level to high-level are aggregated at the same time. The outputs {P_2,P_3,P_4,P_5P\_{2}, P\_{3}, P\_{4}, P\_{5}} are used for object detection following the same pipeline in FPN.