What is: Beta-VAE?

Beta-VAE is a type of variational autoencoder that seeks to discovered disentangled latent factors. It modifies VAEs with an adjustable hyperparameter $\beta$ that balances latent channel capacity and independence constraints with reconstruction accuracy. The idea is to maximize the probability of generating the real data while keeping the distance between the real and estimated distributions small, under a threshold $\epsilon$ . We can use the Kuhn-Tucker conditions to write this as a single equation:

$\mathcal{F}\left(\theta, \phi, \beta; \mathbf{x}, \mathbf{z}\right) = \mathbb{E}\_{q\_{\phi}\left(\mathbf{z}|\mathbf{x}\right)}\left[\log{p}\_{\theta}\left(\mathbf{x}\mid\mathbf{z}\right)\right] - \beta\left[D\_{KL}\left(\log{q}\_{\theta}\left(\mathbf{z}\mid\mathbf{x}\right)||p\left(\mathbf{z}\right)\right) - \epsilon\right]$

where the KKT multiplier $\beta$ is the regularization coefficient that constrains the capacity of the latent channel $\mathbf{z}$ and puts implicit independence pressure on the learnt posterior due to the isotropic nature of the Gaussian prior $p\left(\mathbf{z}\right)$ .

We write this again using the complementary slackness assumption to get the Beta-VAE formulation:

$\mathcal{F}\left(\theta, \phi, \beta; \mathbf{x}, \mathbf{z}\right) \geq \mathcal{L}\left(\theta, \phi, \beta; \mathbf{x}, \mathbf{z}\right) = \mathbb{E}\_{q\_{\phi}\left(\mathbf{z}|\mathbf{x}\right)}\left[\log{p}\_{\theta}\left(\mathbf{x}\mid\mathbf{z}\right)\right] - \beta\{D}\_{KL}\left(\log{q}\_{\theta}\left(\mathbf{z}\mid\mathbf{x}\right)||p\left(\mathbf{z}\right)\right)$

Source	beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com