The Mogrifier LSTM is an extension to the LSTM where the LSTM’s input x is gated conditioned on the output of the previous step h_prev. Next, the gated input is used in a similar manner to gate the output of the
previous time step. After a couple of rounds of this mutual gating, the last updated x and h_prev are fed to an LSTM.
In detail, the Mogrifier is an LSTM where two inputs x and h_prev modulate one another in an alternating fashion before the usual LSTM computation takes place. That is: Mogrify(x,c_prev,h_prev)=LSTM(x↑,c_prev,h↑_prev) where the modulated inputs x↑ and h↑_prev are defined as the highest indexed xi and hi_prev, respectively, from the interleaved sequences:
xi=2σ(Qihi−1_prev)⊙xi−2 for odd i∈[1…r]
hi_prev=2σ(Rixi−1)⊙hi−2_prev for even i∈[1…r]
with x−1=x and h0_prev=h_prev. The number of "rounds", r∈N, is a hyperparameter; r=0 recovers the LSTM. Multiplication with the constant 2 ensures that randomly initialized Qi, Ri matrices result in transformations close to identity. To reduce the number of additional model parameters, we typically factorize the Qi, Ri matrices as products of low-rank matrices: Qi =
Qi_leftQi_right with Qi∈Rm×n, Qi_left∈Rm×k, Qi_right∈Rk×n, where k<min(m,n) is the rank.