Viet-Anh on Software Logo

What is: Scale Aggregation Block?

SourceData-Driven Neuron Allocation for Scale Aggregation Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Scale Aggregation Block concatenates feature maps at a wide range of scales. Feature maps for each scale are generated by a stack of downsampling, convolution and upsampling operations. The proposed scale aggregation block is a standard computational module which readily replaces any given transformation Y=T(X)\mathbf{Y}=\mathbf{T}(\mathbf{X}), where XRH×W×C\mathbf{X}\in \mathbb{R}^{H\times W\times C}, YRH×W×Co\mathbf{Y}\in \mathbb{R}^{H\times W\times C_o} with CC and CoC_o being the input and output channel number respectively. T\mathbf{T} is any operator such as a convolution layer or a series of convolution layers. Assume we have LL scales. Each scale ll is generated by sequentially conducting a downsampling Dl\mathbf{D}_l, a transformation Tl\mathbf{T}_l and an unsampling operator Ul\mathbf{U}_l:

Xl=Dl(X),\labeleq:eqd\mathbf{X}^{'}_l=\mathbf{D}_l(\mathbf{X}), \label{eq:eq_d}
Yl=Tl(Xl),\labeleq:eqtl\mathbf{Y}^{'}_l=\mathbf{T}_l(\mathbf{X}^{'}_l), \label{eq:eq_tl}
Yl=Ul(Yl),\labeleq:equ\mathbf{Y}_l=\mathbf{U}_l(\mathbf{Y}^{'}_l), \label{eq:eq_u}

where XlRHl×Wl×C\mathbf{X}^{'}_l\in \mathbb{R}^{H_l\times W_l\times C}, YlRHl×Wl×Cl\mathbf{Y}^{'}_l\in \mathbb{R}^{H_l\times W_l\times C_l}, and YlRH×W×Cl\mathbf{Y}_l\in \mathbb{R}^{H\times W\times C_l}. Notably, Tl\mathbf{T}_l has the similar structure as T\mathbf{T}. We can concatenate all LL scales together, getting

Y=1LUl(Tl(Dl(X))),\labeleq:eqall\mathbf{Y}^{'}=\Vert^L_1\mathbf{U}_l(\mathbf{T}_l(\mathbf{D}_l(\mathbf{X}))), \label{eq:eq_all}

where \Vert indicates concatenating feature maps along the channel dimension, and YRH×W×1LCl\mathbf{Y}^{'} \in \mathbb{R}^{H\times W\times \sum^L_1 C_l} is the final output feature maps of the scale aggregation block.

In the reference implementation, the downsampling Dl\mathbf{D}_l with factor ss is implemented by a max pool layer with s×ss\times s kernel size and ss stride. The upsampling Ul\mathbf{U}_l is implemented by resizing with the nearest neighbor interpolation.