The Deep LSTM Reader is a neural network for reading comprehension. We feed documents one word at a time into a Deep LSTM encoder, after a delimiter we then also feed the query into the encoder. The model therefore processes each document query pair as a single long sequence. Given the embedded document and query the network predicts which token in the document answers the query.
The model consists of a Deep LSTM cell with skip connections from each input x(t) to every hidden layer, and from every hidden layer to the output y(t):
x′(t,k)=x(t)∣∣y′(t,k−1), y(t)=y′(t,1)∣∣…∣∣y′(t,K)
i(t,k)=(W_kxix′(t,k)+W_khih(t−1,k)+W_kcic(t−1,k)+b_ki)
f(t,k)=(W_kxfx(t)+W_khfh(t−1,k)+W_kcfc(t−1,k)+b_kf)
c(t,k)=f(t,k)c(t−1,k)+i(t,k)tanh(W_kxcx′(t,k)+W_khch(t−1,k)+b_kc)
o(t,k)=(W_kxox′(t,k)+W_khoh(t−1,k)+W_kcoc(t,k)+b_ko)
h(t,k)=o(t,k)tanh(c(t,k))
y′(t,k)=W_kyh(t,k)+b_ky
where || indicates vector concatenation, h(t,k) is the hidden state for layer k at time t, and i, f, o are the input, forget, and output gates respectively. Thus our Deep LSTM Reader is defined by gLSTM(d,q)=y(∣d∣+∣q∣) with input x(t) the concatenation of d and q separated by the delimiter |||.