Samples from the Hierarchical VAE

Each of the following plots are samples of the conditional VAE that I’m using for the inpainting task. As expected with results from a VAE, they’re blurry. However, the fun thing about having a hierarchy of latent variables is I can freeze all the layers except for one, and vary that just to see the type of noise it models. The pictures are generated by using the $\mu_{z_l}(z_{l-1})$ for all layers except for the $i$-th layer.

$i=1$

$i=2$

$i=3$

$i=4$

$i=5$

$i=6$

$i=7$

At the lower levels, it’s not surprising that the type of variation we see is texture and pixel noise. However, as we go up the layers, it’s obvious that the model has not learnt any type of semantic variation of the image, it only seems to vary the amount to which the colours are `diffused’ into the inpainted part of the image.

More interestingly, the model quickly overfits: the KL-divergence terms at some of the layers will be huge for some of the examples in the validation set, and the lower-bound for the validation quickly starts to rise while the lower-bound for the training data drops. It’s been suggested to me that this may be due to an insufficiently powerful prior, I’ll have to look into whether to improve the model in that aspect.

@misc{tan2017-04-01,
  title        = {Samples from the Hierarchical VAE},
  author       = {Tan, Shawn},
  howpublished = {\url{https://blog.wtf.sg/2017/04/02/samples-from-the-hierarchical-vae/}},
  year         = {2017}
}