Viet-Anh on Software Logo

What is: Additive Attention?

SourceNeural Machine Translation by Jointly Learning to Align and Translate
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Additive Attention, also known as Bahdanau Attention, uses a one-hidden layer feed-forward network to calculate the attention alignment score:

fatt(hi,s_j)=v_aTtanh(W_a[h_i;s_j])f_{att}\left(\textbf{h}_{i}, \textbf{s}\_{j}\right) = v\_{a}^{T}\tanh\left(\textbf{W}\_{a}\left[\textbf{h}\_{i};\textbf{s}\_{j}\right]\right)

where v_a\textbf{v}\_{a} and W_a\textbf{W}\_{a} are learned attention parameters. Here h\textbf{h} refers to the hidden states for the encoder, and s\textbf{s} is the hidden states for the decoder. The function above is thus a type of alignment score function. We can use a matrix of alignment scores to show the correlation between source and target words, as the Figure to the right shows.

Within a neural network, once we have the alignment scores, we calculate the final scores using a softmax function of these alignment scores (ensuring it sums to 1).