What is: ShakeDrop?

ShakeDrop regularization extends Shake-Shake regularization and can be applied not only to ResNeXt but also ResNet, WideResNet, and PyramidNet. The proposed ShakeDrop is given as

$G\left(x\right) = x + \left(b\_{l} + \alpha − b\_{l}\alpha\right)F\left(x\right), \text{ in train-fwd}$ $G\left(x\right) = x + \left(b\_{l} + \beta − b\_{l}\beta\right)F\left(x\right), \text{ in train-bwd}$ $G\left(x\right) = x + E\left[b\_{l} + \alpha − b\_{l}\alpha\right]F\left(x\right), \text{ in test}$

where $b\_{l}$ is a Bernoulli random variable with probability $P\left(b\_{l} = 1\right) = E\left[b\_{l} \right] = p\_{l}$ given by the linear decay rule in each layer, and $\alpha$ and $\beta$ are independent uniform random variables in each element.

The most effective ranges of $\alpha$ and $\beta$ were experimentally found to be different from those of Shake-Shake, and are $\alpha$ = 0, $\beta \in \left[0, 1\right]$ and $\alpha \in \left[−1, 1\right]$ , $\beta \in \left[0, 1\right]$ .

Source	ShakeDrop Regularization for Deep Residual Learning
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com