|Training Faster by Separating Modes of Variation in Batch-normalized Models
|CC BY-SA - https://paperswithcode.com
Mixture Normalization is normalization technique that relies on an approximation of the probability density function of the internal representations. Any continuous distribution can be approximated with arbitrary precision using a Gaussian Mixture Model (GMM). Hence, instead of computing one set of statistical measures from the entire population (of instances in the mini-batch) as Batch Normalization does, Mixture Normalization works on sub-populations which can be identified by disentangling modes of the distribution, estimated via GMM.
While BN can only scale and/or shift the whole underlying probability density function, mixture normalization operates like a soft piecewise normalizing transform, capable of completely re-structuring the data distribution by independently scaling and/or shifting individual modes of distribution.