Viet-Anh on Software Logo

What is: Dimension-wise Convolution?

SourceDiCENet: Dimension-wise Convolutions for Efficient Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Dimension-wise Convolution, or DimConv, is a type of convolution that can encode depth-wise, width-wise, and height-wise information independently. To achieve this, DimConv extends depthwise convolutions to all dimensions of the input tensor XRD×H×WX \in \mathbb{R}^{D\times{H}\times{W}}, where WW, HH, and DD corresponds to width, height, and depth of XX. DimConv has three branches, one branch per dimension. These branches apply DD depth-wise convolutional kernels k_DR1×n×nk\_{D} \in \mathbb{R}^{1\times{n}\times{n}} along depth, WW width-wise convolutional kernels k_WRn×1×1k\_{W} \in \mathbb{R}^{n\times{1}\times{1}} along width, and HH height-wise convolutional kernels k_HRn×1×nk\_{H} \in \mathbb{R}^{n\times{1}\times{n}} kernels along height to produce outputs Y_DY\_{D}, Y_WY\_{W}, and Y_HRD×H×WY\_{H} \in \mathbb{R}^{D\times{H}\times{W}} that encode information from all dimensions of the input tensor. The outputs of these independent branches are concatenated along the depth dimension, such that the first spatial plane of Y_DY\_{D}, Y_WY\_{W}, and Y_HY\_{H} are put together and so on, to produce the output Y_Dim=Y\_{Dim} = {Y_DY\_{D}, Y_WY\_{W}, Y_HY\_{H}} R3D×H×W\in \mathbb{R}^{3D\times{H}\times{W}}.