Leveraging the Exact Likelihood of Deep Latent Variable Models

Deep latent variable models (DLVMs) combine the approximation abilities of deep neural networks and the statistical foundations of generative models. Variational methods are commonly used for inference; however, the exact likelihood of these models has been largely overlooked. The purpose of this work is to study the general properties of this quantity and to show how they can be leveraged in practice. We focus on important inferential problems that rely on the likelihood: estimation and missing data imputation. First, we investigate maximum likelihood estimation for DLVMs: in particular, we show that most unconstrained models used for continuous data have an unbounded likelihood function. This problematic behaviour is demonstrated to be a source of mode collapse. We also show how to ensure the existence of maximum likelihood estimates, and draw useful connections with nonparametric mixture models. Finally, we describe an algorithm for missing data imputation using the exact conditional likelihood of a deep latent variable model. On several data sets, our algorithm consistently and significantly outperforms the usual imputation scheme used for DLVMs.

[1]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[2]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[3]  J. Kiefer,et al.  CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR IN THE PRESENCE OF INFINITELY MANY INCIDENTAL PARAMETERS , 1956 .

[4]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[5]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .

[8]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  R. Hathaway A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions , 1985 .

[10]  L. L. Cam,et al.  Maximum likelihood : an introduction , 1990 .

[11]  J. Wellner,et al.  Existence and consistency of maximum likelihood in upgraded mixture models , 1992 .

[12]  L. Tierney Exploring Posterior Distributions Using Markov Chains , 1992 .

[13]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[14]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[15]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[16]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[17]  Thomas D. Sandry,et al.  Introductory Statistics With R , 2003, Technometrics.

[18]  Sara A. van de Geer,et al.  Asymptotic theory for maximum likelihood in nonparametric mixture models , 2003, Comput. Stat. Data Anal..

[19]  A. Gelman Iterative and Non-iterative Simulation Algorithms , 2006 .

[20]  Yong Wang On fast computation of the non‐parametric maximum likelihood estimate of a mixing distribution , 2007 .

[21]  C. Robert,et al.  Computational methods for Bayesian model choice , 2009, 0907.5123.

[22]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Nial Friel,et al.  Estimating the evidence – a review , 2011, 1111.1957.

[25]  C. Biernacki,et al.  A Data-Driven Bound on Variances for Avoiding Degeneracy in Univariate Gaussian Mixtures , 2011 .

[26]  Ruslan Salakhutdinov,et al.  Annealing between distributions by averaging moments , 2013, NIPS.

[27]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[28]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[29]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[30]  Noah D. Goodman,et al.  Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[31]  C. Holmes,et al.  Two-sample Bayesian Nonparametric Hypothesis Testing , 2009, 0910.5060.

[32]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[33]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Ryan P. Adams,et al.  Sandwiching the marginal likelihood using bidirectional Monte Carlo , 2015, ArXiv.

[36]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[37]  Ole Winther,et al.  Bayesian Generalised Ensemble Markov Chain Monte Carlo , 2016, AISTATS.

[38]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[39]  Jorge Cadima,et al.  Principal component analysis: a review and recent developments , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[40]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[41]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[42]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[43]  Jiahua Chen Consistency of the MLE under mixture models , 2016, 1607.01251.

[44]  Bo Zhang,et al.  Learning to Generate with Memory , 2016, ICML.

[45]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[46]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[47]  Zhe Gan,et al.  Variational Autoencoder for Deep Learning of Images, Labels and Captions , 2016, NIPS.

[48]  Frank D. Wood,et al.  Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.

[49]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[50]  David Duvenaud,et al.  Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.

[51]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[52]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[53]  Diederik P. Kingma Variational inference & deep learning: A new synthesis , 2017 .

[54]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[55]  Stefano Ermon,et al.  Learning Hierarchical Features from Deep Generative Models , 2017, ICML.

[56]  David Duvenaud,et al.  Reinterpreting Importance-Weighted Autoencoders , 2017, ICLR.

[57]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[58]  Nitakshi Goyal,et al.  General Topology-I , 2017 .

[59]  Dustin Tran,et al.  Variational Inference via \chi Upper Bound Minimization , 2016, NIPS.

[60]  Dustin Tran,et al.  TensorFlow Distributions , 2017, ArXiv.

[61]  Fabio Viola,et al.  Taming VAEs , 2018, ArXiv.

[62]  Bo Zhang,et al.  Learning Deep Generative Models With Doubly Stochastic Gradient MCMC , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[63]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[64]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[65]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[66]  Yi Zhang,et al.  Do GANs learn the distribution? Some Theory and Empirics , 2018, ICLR.

[67]  Hiroshi Takahashi,et al.  Student-t Variational Autoencoder for Robust Density Estimation , 2018, IJCAI.

[68]  Gang Hua,et al.  Connections with Robust PCA and the Role of Emergent Sparsity in Variational Autoencoder Models , 2018, J. Mach. Learn. Res..

[69]  Yair Weiss,et al.  On GANs and GMMs , 2018, NeurIPS.

[70]  Yann Ollivier,et al.  Mixed batches and symmetric discriminators for GAN training , 2018, ICML.

[71]  C. Bouveyron,et al.  Exact dimensionality selection for Bayesian PCA , 2017, Scandinavian Journal of Statistics.

[72]  Casper Kaae Sønderby,et al.  scVAE: variational auto-encoders for single-cell gene expression data , 2020, Bioinform..