Viet-Anh on Software Logo

What is: Dot-Product Attention?

SourceEffective Approaches to Attention-based Neural Machine Translation
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Dot-Product Attention is an attention mechanism where the alignment score function is calculated as:

fatt(hi,s_j)=h_iTs_jf_{att}\left(\textbf{h}_{i}, \textbf{s}\_{j}\right) = h\_{i}^{T}s\_{j}

It is equivalent to multiplicative attention (without a trainable weight matrix, assuming this is instead an identity matrix). Here h\textbf{h} refers to the hidden states for the encoder, and s\textbf{s} is the hidden states for the decoder. The function above is thus a type of alignment score function.

Within a neural network, once we have the alignment scores, we calculate the final scores/weights using a softmax function of these alignment scores (ensuring it sums to 1).