ShakeDrop regularization extends Shake-Shake regularization and can be applied not only to ResNeXt but also ResNet, WideResNet, and PyramidNet. The proposed ShakeDrop is given as
G(x)=x+(b_l+α−b_lα)F(x), in train-fwd
G(x)=x+(b_l+β−b_lβ)F(x), in train-bwd
G(x)=x+E[b_l+α−b_lα]F(x), in test
where b_l is a Bernoulli random variable with probability P(b_l=1)=E[b_l]=p_l given by the linear decay rule in each layer, and α and β are independent uniform random variables in each element.
The most effective ranges of α and β were experimentally found to be different from those of Shake-Shake, and are α = 0, β∈[0,1] and α∈[−1,1], β∈[0,1].