What is: Additive Attention?
Source | Neural Machine Translation by Jointly Learning to Align and Translate |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Additive Attention, also known as Bahdanau Attention, uses a one-hidden layer feed-forward network to calculate the attention alignment score:
where and are learned attention parameters. Here refers to the hidden states for the encoder, and is the hidden states for the decoder. The function above is thus a type of alignment score function. We can use a matrix of alignment scores to show the correlation between source and target words, as the Figure to the right shows.
Within a neural network, once we have the alignment scores, we calculate the final scores using a softmax function of these alignment scores (ensuring it sums to 1).