Viet-Anh on Software Logo

What is: Deep LSTM Reader?

SourceTeaching Machines to Read and Comprehend
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

The Deep LSTM Reader is a neural network for reading comprehension. We feed documents one word at a time into a Deep LSTM encoder, after a delimiter we then also feed the query into the encoder. The model therefore processes each document query pair as a single long sequence. Given the embedded document and query the network predicts which token in the document answers the query.

The model consists of a Deep LSTM cell with skip connections from each input x(t)x\left(t\right) to every hidden layer, and from every hidden layer to the output y(t)y\left(t\right):

x(t,k)=x(t)y(t,k1)y(t)=y(t,1)y(t,K)x'\left(t, k\right) = x\left(t\right)||y'\left(t, k - 1\right) \text{, } y\left(t\right) = y'\left(t, 1\right)|| \dots ||y'\left(t, K\right)

i(t,k)=(W_kxix(t,k)+W_khih(t1,k)+W_kcic(t1,k)+b_ki)i\left(t, k\right) = \left(W\_{kxi}x'\left(t, k\right) + W\_{khi}h(t - 1, k) + W\_{kci}c\left(t - 1, k\right) + b\_{ki}\right)

f(t,k)=(W_kxfx(t)+W_khfh(t1,k)+W_kcfc(t1,k)+b_kf)f\left(t, k\right) = \left(W\_{kxf}x\left(t\right) + W\_{khf}h\left(t - 1, k\right) + W\_{kcf}c\left(t - 1, k\right) + b\_{kf}\right)

c(t,k)=f(t,k)c(t1,k)+i(t,k)tanh(W_kxcx(t,k)+W_khch(t1,k)+b_kc)c\left(t, k\right) = f\left(t, k\right)c\left(t - 1, k\right) + i\left(t, k\right)\text{tanh}\left(W\_{kxc}x'\left(t, k\right) + W\_{khc}h\left(t - 1, k\right) + b\_{kc}\right)

o(t,k)=(W_kxox(t,k)+W_khoh(t1,k)+W_kcoc(t,k)+b_ko)o\left(t, k\right) = \left(W\_{kxo}x'\left(t, k\right) + W\_{kho}h\left(t - 1, k\right) + W\_{kco}c\left(t, k\right) + b\_{ko}\right)

h(t,k)=o(t,k)tanh(c(t,k))h\left(t, k\right) = o\left(t, k\right)\text{tanh}\left(c\left(t, k\right)\right)

y(t,k)=W_kyh(t,k)+b_kyy'\left(t, k\right) = W\_{kyh}\left(t, k\right) + b\_{ky}

where || indicates vector concatenation, h(t,k)h\left(t, k\right) is the hidden state for layer kk at time tt, and ii, ff, oo are the input, forget, and output gates respectively. Thus our Deep LSTM Reader is defined by gLSTM(d,q)=y(d+q)g^{\text{LSTM}}\left(d, q\right) = y\left(|d|+|q|\right) with input x(t)x\left(t\right) the concatenation of dd and qq separated by the delimiter |||.