What is: Online Normalization?

Online Normalization is a normalization technique for training deep neural networks. To define Online Normalization. we replace arithmetic averages over the full dataset in with exponentially decaying averages of online samples. The decay factors $\alpha\_{f}$ and $\alpha\_{b}$ for forward and backward passes respectively are hyperparameters for the technique.

We allow incoming samples $x\_{t}$ , such as images, to have multiple scalar components and denote feature-wide mean and variance by $\mu\left(x\_{t}\right)$ and $\sigma^{2}\left(x\_{t}\right)$ . The algorithm also applies to outputs of fully connected layers with only one scalar output per feature. In fact, this case simplifies to $\mu\left(x\_{t}\right) = x\_{t}$ and $\sigma\left(x\_{t}\right) = 0$ . Denote scalars $\mu\_{t}$ and $\sigma\_{t}$ to denote running estimates of mean and variance across all samples. The subscript $t$ denotes time steps corresponding to processing new incoming samples.

Online Normalization uses an ongoing process during the forward pass to estimate activation means and variances. It implements the standard online computation of mean and variance generalized to processing multi-value samples and exponential averaging of sample statistics. The resulting estimates directly lead to an affine normalization transform.

$y\_{t} = \frac{x\_{t} - \mu\_{t-1}}{\sigma\_{t-1}}$

$\mu\_{t} = \alpha\_{f}\mu\_{t-1} + \left(1-\alpha\_{f}\right)\mu\left(x\_{t}\right)$

$\sigma^{2}\_{t} = \alpha\_{f}\sigma^{2}\_{t-1} + \left(1-\alpha\_{f}\right)\sigma^{2}\left(x\_{t}\right) + \alpha\_{f}\left(1-\alpha\_{f}\right)\left(\mu\left(x\_{t}\right) - \mu\_{t-1}\right)^{2}$

Source	Online Normalization for Training Neural Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com