Viet-Anh on Software Logo

What is: Deformable Convolutional Networks?

SourceDeformable Convolutional Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Deformable ConvNets do not learn an affine transformation. They divide convolution into two steps, firstly sampling features on a regular grid R\mathcal{R} from the input feature map, then aggregating sampled features by weighted summation using a convolution kernel. The process can be written as: \begin{align} Y(p_{0}) &= \sum_{p_i \in \mathcal{R}} w(p_{i}) X(p_{0} + p_{i}) \end{align} \begin{align} \mathcal{R} &= {(-1,-1), (-1, 0), \dots, (1, 1)} \end{align} The deformable convolution augments the sampling process by introducing a group of learnable offsets Δpi\Delta p_{i} which can be generated by a lightweight CNN. Using the offsets Δpi\Delta p_{i}, the deformable convolution can be formulated as: \begin{align} Y(p_{0}) &= \sum_{p_i \in \mathcal{R}} w(p_{i}) X(p_{0} + p_{i} + \Delta p_{i}). \end{align} Through the above method, adaptive sampling is achieved. However, Δpi\Delta p_{i} is a floating point value unsuited to grid sampling. To address this problem, bilinear interpolation is used. Deformable RoI pooling is also used, which greatly improves object detection.

Deformable ConvNets adaptively select the important regions and enlarge the valid receptive field of convolutional neural networks; this is important in object detection and semantic segmentation tasks.