Viet-Anh on Software Logo

What is: Location Sensitive Attention?

SourceAttention-Based Models for Speech Recognition
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Location Sensitive Attention is an attention mechanism that extends the additive attention mechanism to use cumulative attention weights from previous decoder time steps as an additional feature. This encourages the model to move forward consistently through the input, mitigating potential failure modes where some subsequences are repeated or ignored by the decoder.

Starting with additive attention where hh is a sequential representation from a BiRNN encoder and s_i1{s}\_{i-1} is the (i1)(i − 1)-th state of a recurrent neural network (e.g. a LSTM or GRU):

e_i,j=wTtanh(Ws_i1+Vh_j+b)e\_{i, j} = w^{T}\tanh\left(W{s}\_{i-1} + Vh\_{j} + b\right)

where ww and bb are vectors, WW and VV are matrices. We extend this to be location-aware by making it take into account the alignment produced at the previous step. First, we extract kk vectors f_i,jRkf\_{i,j} \in \mathbb{R}^{k} for every position jj of the previous alignment α_i1\alpha\_{i−1} by convolving it with a matrix FRk×rF \in R^{k\times{r}}:

f_i=Fα_i1f\_{i} = F ∗ \alpha\_{i−1}

These additional vectors f_i,jf\_{i,j} are then used by the scoring mechanism e_i,je\_{i,j}:

e_i,j=wTtanh(Ws_i1+Vh_j+Uf_i,j+b)e\_{i,j} = w^{T}\tanh\left(Ws\_{i−1} + Vh\_{j} + Uf\_{i,j} + b\right)