What is: Location Sensitive Attention?
Source | Attention-Based Models for Speech Recognition |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Location Sensitive Attention is an attention mechanism that extends the additive attention mechanism to use cumulative attention weights from previous decoder time steps as an additional feature. This encourages the model to move forward consistently through the input, mitigating potential failure modes where some subsequences are repeated or ignored by the decoder.
Starting with additive attention where is a sequential representation from a BiRNN encoder and is the -th state of a recurrent neural network (e.g. a LSTM or GRU):
where and are vectors, and are matrices. We extend this to be location-aware by making it take into account the alignment produced at the previous step. First, we extract vectors for every position of the previous alignment by convolving it with a matrix :
These additional vectors are then used by the scoring mechanism :