Decision-Making with Auto-Encoding Variational Bayes

To make decisions based on a model fit by Auto-Encoding Variational Bayes (AEVB), practitioners typically use importance sampling to estimate a functional of the posterior distribution. The variational distribution found by AEVB serves as the proposal distribution for importance sampling. However, this proposal distribution may give unreliable (high variance) importance sampling estimates, thus leading to poor decisions. We explore how changing the objective function for learning the variational distribution, while continuing to learn the generative model based on the ELBO, affects the quality of downstream decisions. For a particular model, we characterize the error of importance sampling as a function of posterior variance and show that proposal distributions learned with evidence upper bounds are better. Motivated by these theoretical results, we propose a novel variant of the VAE. In addition to experimenting with MNIST, we present a full-fledged application of the proposed method to single-cell RNA sequencing. In this challenging instance of multiple hypothesis testing, the proposed method surpasses the current state of the art.

[1]  J. Burbea The convexity with respect to Gaussian distributions of divergences of order a , 1984 .

[2]  P. Laplace Memoir on the Probability of the Causes of Events , 1986 .

[3]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[4]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[5]  T. Hesterberg,et al.  Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .

[6]  Leonidas J. Guibas,et al.  Optimally combining sampling techniques for Monte Carlo rendering , 1995, SIGGRAPH.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[9]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[10]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[11]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[12]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[13]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[14]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[15]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[16]  Zoubin Ghahramani,et al.  Approximate inference for the loss-calibrated Bayesian , 2011, AISTATS.

[17]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[18]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[19]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[22]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[23]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[24]  O. Papaspiliopoulos,et al.  Importance Sampling: Intrinsic Dimension and Computational Cost , 2015, 1511.06196.

[25]  Subharup Guha,et al.  hmmSeq: A hidden Markov model for detecting differentially expressed genes from RNA-seq data , 2015, 1509.04838.

[26]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  P. Diaconis,et al.  The sample size required in importance sampling , 2015, 1511.01437.

[29]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[30]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[31]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[32]  Zhe Gan,et al.  Variational Autoencoder for Deep Learning of Images, Labels and Captions , 2016, NIPS.

[33]  Sandrine Dudoit,et al.  Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq , 2017 .

[34]  Ben Poole,et al.  Categorical Reparametrization with Gumble-Softmax , 2017, ICLR 2017.

[35]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[36]  David Vázquez,et al.  PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[37]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[38]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[39]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[40]  Justin Domke,et al.  Importance Weighting and Variational Inference , 2018, NeurIPS.

[41]  Hao Liu,et al.  Variational Inference with Tail-adaptive f-Divergence , 2018, NeurIPS.

[42]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[43]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[44]  Shakir Mohamed,et al.  Implicit Reparameterization Gradients , 2018, NeurIPS.

[45]  Aki Vehtari,et al.  Yes, but Did It Work?: Evaluating Variational Inference , 2018, ICML.

[46]  Debora S Marks,et al.  Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.

[47]  Masashi Sugiyama,et al.  Variational Inference based on Robust Divergences , 2017, AISTATS.

[48]  Guy Rosman,et al.  Variational Autoencoder for End-to-End Control of Autonomous Driving with Novelty Detection and Training De-biasing , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[49]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[50]  Lawrence Carin,et al.  Variational Inference and Model Selection with Generalized Evidence Bounds , 2018, ICML.

[51]  Chunlin Ji,et al.  Stochastic Variational Inference via Upper Bound , 2019, ArXiv.

[52]  Yee Whye Teh,et al.  Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow , 2018, UAI.

[53]  Michael I. Jordan,et al.  Deep Generative Models for Detecting Differential Expression in Single Cells , 2019, bioRxiv.

[54]  Tom Rainforth,et al.  Amortized Monte Carlo Integration , 2019, ICML.

[55]  Michael C. Hughes,et al.  Challenges in Computing and Optimizing Upper Bounds of Marginal Likelihood based on Chi-Square Divergences , 2019 .

[56]  Michael I. Jordan,et al.  Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models , 2019, bioRxiv.

[57]  Rafael A. Irizarry,et al.  Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model , 2019, Genome Biology.

[58]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[59]  Tomasz Kusmierczyk,et al.  Variational Bayesian Decision-making for Continuous Utilities , 2019, NeurIPS.

[60]  Michael I. Jordan,et al.  Rao-Blackwellized Stochastic Gradients for Discrete Distributions , 2018, ICML.

[61]  Nir Yosef,et al.  Simulating multiple faceted variability in single cell RNA sequencing , 2019, Nature Communications.

[62]  Helena L. Crowell,et al.  On the discovery of subpopulation-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data , 2019, bioRxiv.

[63]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[64]  Mohammad Norouzi,et al.  Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse , 2019, NeurIPS.

[65]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[66]  K. Zajkowski Bounds on tail probabilities for quadratic forms in dependent sub-gaussian random variables , 2018, 1809.08569.