Viet-Anh on Software Logo

What is: Embedding Dropout?

SourceA Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Embedding Dropout is equivalent to performing dropout on the embedding matrix at a word level, where the dropout is broadcast across all the word vector’s embedding. The remaining non-dropped-out word embeddings are scaled by 11p_e\frac{1}{1-p\_{e}} where p_ep\_{e} is the probability of embedding dropout. As the dropout occurs on the embedding matrix that is used for a full forward and backward pass, this means that all occurrences of a specific word will disappear within that pass, equivalent to performing variational dropout on the connection between the one-hot embedding and the embedding lookup.

Source: Merity et al, Regularizing and Optimizing LSTM Language Models