Viet-Anh on Software Logo

What is: Stochastic Depth?

SourceDeep Networks with Stochastic Depth
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Stochastic Depth aims to shrink the depth of a network during training, while keeping it unchanged during testing. This is achieved by randomly dropping entire ResBlocks during training and bypassing their transformations through skip connections.

Let b_lb\_{l} \in {0,10, 1} denote a Bernoulli random variable, which indicates whether the llth ResBlock is active (b_l=1b\_{l} = 1) or inactive (b_l=0b\_{l} = 0). Further, let us denote the “survival” probability of ResBlock ll as p_l=Pr(b_l=1)p\_{l} = \text{Pr}\left(b\_{l} = 1\right). With this definition we can bypass the llth ResBlock by multiplying its function f_lf\_{l} with b_lb\_{l} and we extend the update rule to:

H_l=ReLU(b_lf_l(H_l1)+id(H_l1))H\_{l} = \text{ReLU}\left(b\_{l}f\_{l}\left(H\_{l-1}\right) + \text{id}\left(H\_{l-1}\right)\right)

If b_l=1b\_{l} = 1, this reduces to the original ResNet update and this ResBlock remains unchanged. If b_l=0b\_{l} = 0, the ResBlock reduces to the identity function, H_l=id((H_l1)H\_{l} = \text{id}\left((H\_{l}−1\right).