AICurious Logo

What is: Shake-Shake Regularization?

SourceShake-Shake regularization
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Shake-Shake Regularization aims to improve the generalization ability of multi-branch networks by replacing the standard summation of parallel branches with a stochastic affine combination. A typical pre-activation ResNet with 2 residual branches would follow this equation:

x_i+1=x_i+F(x_i,W_i(1))+F(x_i,W_i(2))x\_{i+1} = x\_{i} + \mathcal{F}\left(x\_{i}, \mathcal{W}\_{i}^{\left(1\right)}\right) + \mathcal{F}\left(x\_{i}, \mathcal{W}\_{i}^{\left(2\right)}\right)

Shake-shake regularization introduces a random variable α_i\alpha\_{i} following a uniform distribution between 0 and 1 during training:

x_i+1=x_i+αF(x_i,W_i(1))+(1α)F(x_i,W_i(2))x\_{i+1} = x\_{i} + \alpha\mathcal{F}\left(x\_{i}, \mathcal{W}\_{i}^{\left(1\right)}\right) + \left(1-\alpha\right)\mathcal{F}\left(x\_{i}, \mathcal{W}\_{i}^{\left(2\right)}\right)

Following the same logic as for dropout, all α_i\alpha\_{i} are set to the expected value of 0.50.5 at test time.