What is: Deformable Attention Module?
Source | Deformable DETR: Deformable Transformers for End-to-End Object Detection |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Deformable Attention Module is an attention module used in the Deformable DETR architecture, which seeks to overcome one issue base Transformer attention in that it looks over all possible spatial locations. Inspired by deformable convolution, the deformable attention module only attends to a small set of key sampling points around a reference point, regardless of the spatial size of the feature maps. By assigning only a small fixed number of keys for each query, the issues of convergence and feature spatial resolution can be mitigated.
Given an input feature map , let index a query element with content feature and a 2-d reference point , the deformable attention feature is calculated by:
where indexes the attention head, indexes the sampled keys, and is the total sampled key number and denote the sampling offset and attention weight of the sampling point in the attention head, respectively. The scalar attention weight lies in the range , normalized by are of 2-d real numbers with unconstrained range. As is fractional, bilinear interpolation is applied as in Dai et al. (2017) in computing . Both and are obtained via linear projection over the query feature In implementation, the query feature is fed to a linear projection operator of channels, where the first channels encode the sampling offsets , and the remaining channels are fed to a softmax operator to obtain the attention weights .