Viet-Anh on Software Logo

What is: Beta-VAE?

Sourcebeta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Beta-VAE is a type of variational autoencoder that seeks to discover disentangled latent factors. It modifies VAEs with an adjustable hyperparameter β\beta that balances latent channel capacity and independence constraints with reconstruction accuracy. The idea is to maximize the probability of generating the real data while keeping the distance between the real and estimated distributions small, under a threshold ϵ\epsilon. We can use the Kuhn-Tucker conditions to write this as a single equation:

F(θ,ϕ,β;x,z)=E_q_ϕ(zx)[logp_θ(xz)]β[D_KL(logq_θ(zx)p(z))ϵ] \mathcal{F}\left(\theta, \phi, \beta; \mathbf{x}, \mathbf{z}\right) = \mathbb{E}\_{q\_{\phi}\left(\mathbf{z}|\mathbf{x}\right)}\left[\log{p}\_{\theta}\left(\mathbf{x}\mid\mathbf{z}\right)\right] - \beta\left[D\_{KL}\left(\log{q}\_{\theta}\left(\mathbf{z}\mid\mathbf{x}\right)||p\left(\mathbf{z}\right)\right) - \epsilon\right]

where the KKT multiplier β\beta is the regularization coefficient that constrains the capacity of the latent channel z\mathbf{z} and puts implicit independence pressure on the learnt posterior due to the isotropic nature of the Gaussian prior p(z)p\left(\mathbf{z}\right).

We write this again using the complementary slackness assumption to get the Beta-VAE formulation:

\mathcal{F}\left(\theta, \phi, \beta; \mathbf{x}, \mathbf{z}\right) \geq \mathcal{L}\left(\theta, \phi, \beta; \mathbf{x}, \mathbf{z}\right) = \mathbb{E}\_{q\_{\phi}\left(\mathbf{z}|\mathbf{x}\right)}\left[\log{p}\_{\theta}\left(\mathbf{x}\mid\mathbf{z}\right)\right] - \beta\{D}\_{KL}\left(\log{q}\_{\theta}\left(\mathbf{z}\mid\mathbf{x}\right)||p\left(\mathbf{z}\right)\right)