**Embedding Dropout** is equivalent to performing [dropout](https://paperswithcode.com/method/dropout) on the embedding matrix at a word level, where the dropout is broadcast across all the word vector’s embedding. The remaining non-dropped-out word embeddings are scaled by $\frac{1}{1-p\_{e}}$ where $p\_{e}$ is the probability of embedding dropout. As the dropout occurs on the embedding matrix that is used for a full forward and backward pass, this means that all occurrences of a specific word will disappear within that pass, equivalent to performing [variational dropout](https://paperswithcode.com/method/variational-dropout) on the connection between the one-hot embedding and the embedding lookup.

Source: Merity et al, Regularizing and Optimizing [LSTM](https://paperswithcode.com/method/lstm) Language Models

**Graph Echo State Network** (**GraphESN**) model is a generalization of the Echo State Network (ESN) approach to graph domains. GraphESNs allow for an efficient approach to Recursive Neural Networks (RecNNs) modeling extended to deal with cyclic/acyclic, directed/undirected, labeled graphs. The recurrent reservoir of the network computes a fixed contractive encoding function over graphs and is left untrained after initialization, while a feed-forward readout implements an adaptive linear output function. Contractivity of the state transition function implies a Markovian characterization of state dynamics and stability of the state computation in presence of cycles. Due to the use of fixed (untrained) encoding, the model represents both an extremely efficient version and a baseline for the performance of recursive models with trained connections.

Description from: [Graph Echo State Networks](https://ieeexplore.ieee.org/document/5596796)

GraphESN

Embedding Dropout

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

A **Multiplicative LSTM (mLSTM)** is a  recurrent neural network architecture for sequence modelling that combines the long short-term memory ([LSTM](https://paperswithcode.com/method/lstm)) and multiplicative recurrent neural network ([mRNN](https://paperswithcode.com/method/mrnn)) architectures. The mRNN and LSTM architectures can be combined by adding connections from the mRNN’s intermediate state $m\_{t}$ to each gating units in the LSTM.

Source	A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com