What is: Residual Normal Distribution?

Residual Normal Distributions are used to help the optimization of VAEs, preventing optimization from entering an unstable region. This can happen due to sharp gradients caused in situations where the encoder and decoder produce distributions far away from each other. The residual distribution parameterizes $q\left(\mathbf{z}|\mathbf{x}\right)$ relative to $p\left(\mathbf{z}\right)$ . Let $p\left(z^{i}\_{l}|\mathbf{z}\_{<l}\right) := N \left(\mu\_{i}\left(\mathbf{z}\_{<l}\right), \sigma\_{i}\left(\mathbf{z}\_{<l}\right)\right)$ be a Normal distribution for the $i$ th variable in $\mathbf{z}\_{l}$ in prior. Define $q\left(z^{i}\_{l}|\mathbf{z}\_{<l}, x\right) := N\left(\mu\_{i}\left(\mathbf{z}\_{<l}\right) + \Delta\mu\_{i}\left(\mathbf{z}\_{<l}, x\right), \sigma\_{i}\left(\mathbf{z}\_{<l}\right) \cdot \Delta\sigma\_{i}\left(\mathbf{z}\_{<l}, x\right) \right)$ , where $\Delta\mu\_{i}\left(\mathbf{z}\_{<l}, \mathbf{x}\right)$ and $\Delta\sigma\_{i}\left(\mathbf{z}\_{<l}, \mathbf{x}\right)$ are the relative location and scale of the approximate posterior with respect to the prior. With this parameterization, when the prior moves, the approximate posterior moves accordingly, if not changed.

Source	NVAE: A Deep Hierarchical Variational Autoencoder
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com