Viet-Anh on Software Logo

What is: Embedded Gaussian Affinity?

SourceNon-local Neural Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Embedded Gaussian Affinity is a type of affinity or self-similarity function between two points x_i\mathbf{x\_{i}} and x_j\mathbf{x\_{j}} that uses a Gaussian function in an embedding space:

f(x_i,x_j)=eθ(x_i)Tϕ(x_j)f\left(\mathbf{x\_{i}}, \mathbf{x\_{j}}\right) = e^{\theta\left(\mathbf{x\_{i}}\right)^{T}\phi\left(\mathbf{x\_{j}}\right)}

Here θ(x_i)=W_θx_i\theta\left(x\_{i}\right) = W\_{θ}x\_{i} and ϕ(x_j)=W_φx_j\phi\left(x\_{j}\right) = W\_{φ}x\_{j} are two embeddings.

Note that the self-attention module used in the original Transformer model is a special case of non-local operations in the embedded Gaussian version. This can be seen from the fact that for a given ii, 1C(x)_jf(x_i,x_j)g(x_j)\frac{1}{\mathcal{C}\left(\mathbf{x}\right)}\sum\_{\forall{j}}f\left(\mathbf{x}\_{i}, \mathbf{x}\_{j}\right)g\left(\mathbf{x}\_{j}\right) becomes the softmax computation along the dimension jj. So we have y=softmax(xTWT_θW_ϕx)g(x)\mathbf{y} = \text{softmax}\left(\mathbf{x}^{T}W^{T}\_{\theta}W\_{\phi}\mathbf{x}\right)g\left(\mathbf{x}\right), which is the self-attention form in the Transformer model. This shows how we can relate this recent self-attention model to the classic computer vision method of non-local means.