VAE and MLE

2 minute read

Now Im sure that all my VAE and ELBO related learning were NOT recorded in this blog. So it’s time to review some courses from Hung-yi.

1 Maximum Likelyhood Estimation

A good intuition about MLE is from StatQuest. A quick review before we jump into VAE. The likelihood of observation data is the key of this method. The intuition is find a way to maximun such likelihood, by choosing a proper mean Alt text and choosing a proper variance Mathmatically, with a series of observed data, we can estimate the mean of the distribution For mean, to make the derivate equals to zero, we have For variance, we have So the results is a well-known formula Last not least, the probability vs likelyhood

2 Variational Auto Encoder

VAE is an AE, instead of encode to a embeded vector, it encode to a mean and a variance Alt text The intuition is as following

AE only maps to discreted points in the latent space, so the points in between may have NO meaning
VAE is trying to map to points with noise, so all the points within the noise range can be decoded to the original image, with some variance
So a point in between, should have features of all points around it, so it can generate a more meaningful picture The loss function of VAE has two parts, the first one is the constructional loss, same as AE.
The second part is essencially force the variance to be 1 instead of 0. Otherwise the best way to minimize loss is same as AE, which is have zero variance

3 Some Math details

Any distribution can be expressed as Gaussian Mixture Model. The conditional prob $p(x|z)$ is a normal distribution with mean and var from a list of values. Alt text Instead of using m discreted clusters, we can use continuous normal distribution z. and $p(x|z)$ is also a norm distribution with mean and variance generated by a NN (encoder) In order to estimate the mean and var, we would like to maximize the likelihoof of observed data, so it’s MLE The introduced $q(z|x)$ is actually our encoder. We can rewrite the likehood formula with q giving that the integral over z is always 1 Alt text After some math, the problem becomes we need to increase the lowber bound. But increase the lower bound does NOT garantee increase the likelihood, unless we minimized the KL between p and q, which make our likelihood more close to the lower bound Now let’s break down the lower bound So it has two parts of this LB

Making the $q(z x)$ normal, which is more close to norm distribution of p(z)
Maximize the second part, which is essencially make the output close to sampled x, which is the construction loss part

4 Problem of VAE

The main issue of VAE is that it only trys to synthesis from sample images, but never really try to generate new ones. The solution to this problem is using GAN, which adds a discriminator for better generating purpose Alt text A bit sad that I never took any notes when I learnt GAN, which is the coolest thing since the invention of “sliced bread”, according to LeCun. GAN is a bit of outdate these days, so I don’t think I will review it any time soon. But that’s where I firstly learned KL Divergence, thanks to Ian and his GAN paper.

Twitter Facebook LinkedIn

VAE and MLE

1 Maximum Likelyhood Estimation

2 Variational Auto Encoder

3 Some Math details

4 Problem of VAE

You May Also Enjoy

Flow and Diffusion models Part 4 - Classifer-Free Guidence

Something about IRA

Flow and Diffusion models Part 3 - Langevin and Matching

Ray continue on Two H200x8 nodes