What is: Scale Aggregation Block?

A Scale Aggregation Block concatenates feature maps at a wide range of scales. Feature maps for each scale are generated by a stack of downsampling, convolution and upsampling operations. The proposed scale aggregation block is a standard computational module which readily replaces any given transformation $\mathbf{Y}=\mathbf{T}(\mathbf{X})$ , where $\mathbf{X}\in \mathbb{R}^{H\times W\times C}$ , $\mathbf{Y}\in \mathbb{R}^{H\times W\times C_o}$ with $C$ and $C_o$ being the input and output channel number respectively. $\mathbf{T}$ is any operator such as a convolution layer or a series of convolution layers. Assume we have $L$ scales. Each scale $l$ is generated by sequentially conducting a downsampling $\mathbf{D}_l$ , a transformation $\mathbf{T}_l$ and an unsampling operator $\mathbf{U}_l$ :

\mathbf{X}^{'}_l=\mathbf{D}_l(\mathbf{X}), \label{eq:eq_d}

\mathbf{Y}^{'}_l=\mathbf{T}_l(\mathbf{X}^{'}_l), \label{eq:eq_tl}

\mathbf{Y}_l=\mathbf{U}_l(\mathbf{Y}^{'}_l), \label{eq:eq_u}

where $\mathbf{X}^{'}_l\in \mathbb{R}^{H_l\times W_l\times C}$ , $\mathbf{Y}^{'}_l\in \mathbb{R}^{H_l\times W_l\times C_l}$ , and $\mathbf{Y}_l\in \mathbb{R}^{H\times W\times C_l}$ . Notably, $\mathbf{T}_l$ has the similar structure as $\mathbf{T}$ . We can concatenate all $L$ scales together, getting

\mathbf{Y}^{'}=\Vert^L_1\mathbf{U}_l(\mathbf{T}_l(\mathbf{D}_l(\mathbf{X}))), \label{eq:eq_all}

where $\Vert$ indicates concatenating feature maps along the channel dimension, and $\mathbf{Y}^{'} \in \mathbb{R}^{H\times W\times \sum^L_1 C_l}$ is the final output feature maps of the scale aggregation block.

In the reference implementation, the downsampling $\mathbf{D}_l$ with factor $s$ is implemented by a max pool layer with $s\times s$ kernel size and $s$ stride. The upsampling $\mathbf{U}_l$ is implemented by resizing with the nearest neighbor interpolation.

Source	Data-Driven Neuron Allocation for Scale Aggregation Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com