Viet-Anh on Software Logo

What is: ShakeDrop?

SourceShakeDrop Regularization for Deep Residual Learning
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

ShakeDrop regularization extends Shake-Shake regularization and can be applied not only to ResNeXt but also ResNet, WideResNet, and PyramidNet. The proposed ShakeDrop is given as

G(x)=x+(b_l+αb_lα)F(x), in train-fwdG\left(x\right) = x + \left(b\_{l} + \alpha − b\_{l}\alpha\right)F\left(x\right), \text{ in train-fwd} G(x)=x+(b_l+βb_lβ)F(x), in train-bwdG\left(x\right) = x + \left(b\_{l} + \beta − b\_{l}\beta\right)F\left(x\right), \text{ in train-bwd} G(x)=x+E[b_l+αb_lα]F(x), in testG\left(x\right) = x + E\left[b\_{l} + \alpha − b\_{l}\alpha\right]F\left(x\right), \text{ in test}

where b_lb\_{l} is a Bernoulli random variable with probability P(b_l=1)=E[b_l]=p_lP\left(b\_{l} = 1\right) = E\left[b\_{l} \right] = p\_{l} given by the linear decay rule in each layer, and α\alpha and β\beta are independent uniform random variables in each element.

The most effective ranges of α\alpha and β\beta were experimentally found to be different from those of Shake-Shake, and are α\alpha = 0, β[0,1]\beta \in \left[0, 1\right] and α[1,1]\alpha \in \left[−1, 1\right], β[0,1]\beta \in \left[0, 1\right].