**Mixture Normalization** is normalization technique that relies on an approximation of the probability density function of the internal representations. Any continuous distribution can be approximated with arbitrary precision using a Gaussian Mixture Model (GMM). Hence, instead of computing one set of statistical measures from the entire population (of instances in the mini-batch) as [Batch Normalization](https://paperswithcode.com/method/batch-normalization) does, Mixture Normalization works on sub-populations which can be identified by disentangling modes of the distribution, estimated via GMM. 

While BN can only scale and/or shift the whole underlying probability density function, mixture normalization operates like a soft piecewise normalizing transform, capable of completely re-structuring the data distribution by independently scaling and/or shifting individual modes of distribution.

**Multi-band MelGAN**, or **MB-MelGAN**, is a waveform generation model focusing on high-quality text-to-speech. It improves the original [MelGAN](https://paperswithcode.com/method/melgan) in several ways. First, it increases the receptive field of the generator, which is proven to be beneficial to speech generation. Second, it substitutes the feature matching loss with the multi-resolution STFT loss to better measure the difference between fake and real speech. Lastly, [MelGAN](https://paperswithcode.com/method/melgan) is extended with multi-band processing: the generator takes mel-spectrograms as input and produces sub-band signals which are subsequently summed back to full-band signals as discriminator input.

Multi-band MelGAN

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

Mixture Normalization

Training Faster by Separating Modes of Variation in Batch-normalized Models

**VC R-CNN** is an unsupervised feature representation learning method, which uses Region-based Convolutional Neural Network ([R-CNN](https://paperswithcode.com/method/r-cnn)) as the visual backbone, and the causal intervention as the training objective. Given a set of detected object regions in an image (e.g., using [Faster R-CNN](https://paperswithcode.com/method/faster-r-cnn)), like any other unsupervised feature learning methods (e.g., word2vec), the proxy training objective of VC R-CNN is to predict the contextual objects of a region. However, they are fundamentally different: the prediction of VC R-CNN is by using causal intervention: P(Y|do(X)), while others are by using the conventional likelihood: P(Y|X). This is also the core reason why VC R-CNN can learn "sense-making" knowledge like chair can be sat -- while not just "common" co-occurrences such as the chair is likely to exist if table is observed.

Source	Training Faster by Separating Modes of Variation in Batch-normalized Models
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com