Modeling assumptions and evaluation schemes: On the assessment of deep latent variable models

Recent findings indicate that deep generative models can assign unreasonably high likelihoods to out-of-distribution data points. Especially in applications such as autonomous driving, medicine and robotics, these overconfident ratings can have detrimental effects. In this work, we argue that two points contribute to these findings: 1) modeling assumptions such as the choice of the likelihood, and 2) the evaluation under local posterior distributions vs global prior distributions. We demonstrate experimentally how these mechanisms can bias the likelihood estimates of variational autoencoders.