Viet-Anh on Software Logo

What is: Adaptive Dropout?

SourceAdaptive dropout for training deep neural networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Adaptive Dropout is a regularization technique that extends dropout by allowing the dropout probability to be different for different units. The intuition is that there may be hidden units that can individually make confident predictions for the presence or absence of an important feature or combination of features. Dropout will ignore this confidence and drop the unit out 50% of the time.

Denote the activity of unit jj in a deep neural network by a_ja\_{j} and assume that its inputs are {a_i:i<ja\_{i}: i < j}. In dropout, a_ja\_{j} is randomly set to zero with probability 0.5. Let m_jm\_{j} be a binary variable that is used to mask, the activity a_ja\_{j}, so that its value is:

a_j=m_jg(_i:i<jw_j,ia_i) a\_{j} = m\_{j}g \left( \sum\_{i: i<j}w\_{j, i}a\_{i} \right)

where w_j,iw\_{j,i} is the weight from unit ii to unit jj and g()g\left(·\right) is the activation function and a_0=1a\_{0} = 1 accounts for biases. Whereas in standard dropout, m_jm\_{j} is Bernoulli with probability 0.50.5, adaptive dropout uses adaptive dropout probabilities that depends on input activities:

P(m_j=1{a_i:i<j})=f(_i:i<jπ_j,ia_i)P\left(m\_{j} = 1\mid{\{a\_{i}: i < j\}}\right) = f \left( \sum\_{i: i<j}\pi{\_{j, i}a\_{i}} \right)

where π_j,i\pi\_{j, i} is the weight from unit ii to unit jj in the standout network or the adaptive dropout network; f()f(·) is a sigmoidal function. Here 'standout' refers to a binary belief network is that is overlaid on a neural network as part of the overall regularization technique.