What is: Deformable Attention Module?

Deformable Attention Module is an attention module used in the Deformable DETR architecture, which seeks to overcome one issue base Transformer attention in that it looks over all possible spatial locations. Inspired by deformable convolution, the deformable attention module only attends to a small set of key sampling points around a reference point, regardless of the spatial size of the feature maps. By assigning only a small fixed number of keys for each query, the issues of convergence and feature spatial resolution can be mitigated.

Given an input feature map $x \in \mathbb{R}^{C \times H \times W}$ , let $q$ index a query element with content feature $\mathbf{z}\_{q}$ and a 2-d reference point $\mathbf{p}\_{q}$ , the deformable attention feature is calculated by:

where $m$ indexes the attention head, $k$ indexes the sampled keys, and $K$ is the total sampled key number $(K \ll H W) . \Delta p_{m q k}$ and $A_{m q k}$ denote the sampling offset and attention weight of the $k^{\text {th }}$ sampling point in the $m^{\text {th }}$ attention head, respectively. The scalar attention weight $A_{m q k}$ lies in the range $[0,1]$ , normalized by $\sum_{k=1}^{K} A_{m q k}=1 . \Delta \mathbf{p}_{m q k} \in \mathbb{R}^{2}$ are of 2-d real numbers with unconstrained range. As $p\_{q}+\Delta p\_{m q k}$ is fractional, bilinear interpolation is applied as in Dai et al. (2017) in computing $\mathbf{x}\left(\mathbf{p}\_{q}+\Delta \mathbf{p}\_{m q k}\right)$ . Both $\Delta \mathbf{p}\_{m q k}$ and $A\_{m q k}$ are obtained via linear projection over the query feature $z\_{q} .$ In implementation, the query feature $z\_{q}$ is fed to a linear projection operator of $3 M K$ channels, where the first $2 M K$ channels encode the sampling offsets $\Delta p\_{m q k}$ , and the remaining $M K$ channels are fed to a softmax operator to obtain the attention weights $A\_{m q k}$ .

Source	Deformable DETR: Deformable Transformers for End-to-End Object Detection
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Deformable Attention Module?

Viet-Anh on Software