Deep generative modelling for amortised variational inference

Probabilistic and statistical modelling are the fundamental frameworks that underlie a large proportion of the modern machine learning (ML) techniques. These frameworks allow for the practitioners to develop tailor-made models for their problems that may include their expert knowledge and can learn from data. Learning from data in the Bayesian framework is referred as inference. In general, model-specific inference methods are hard to derive as they require high level of mathematical and statistical dexterity on the practitioner’s part. As a result, there is a large industry of researchers in ML and statistics that work towards developing automatic methods of inference (Carpenter et al., 2017; Tran et al., 2016; Kucukelbir et al., 2016; Ge et al., 2018; Salvatier et al., 2016; Uber, 2017; Lintusaari et al., 2018). These methods are generally model agnostic and are therefore called black-box inference. Recent work has shown that use of deep learning techniques (Rezende and Mohamed, 2015b; Kingma et al., 2016; Srivastava and Sutton, 2017; Mescheder et al., 2017a) within the framework of variational inference (Jordan et al., 1999) not only allows for automatic and accurate inference but does so in a drastically efficient way. The added efficiency comes from the amortisation of the learning cost by using deep neural networks to leverage the smoothness between data points and their posterior parameters. The field of deep learning based amortised variational inference is relatively new and therefore has numerous challenges and issues to be tackled before it can be established as a standard method of inference. To this end, this thesis presents four pieces of original work in the domain of automatic amortised variational inference in statistical models. We first introduce two sets of techniques for amortising variational inference in Bayesian generative models such as the Latent Dirichlet Allocation (Blei et al., 2003) and Pachinko Allocation Machine (Li and McCallum, 2006). These techniques use deep neural networks and stochastic gradient based first order optimisers for inference and can be generically applied for inference in a large number of Bayesian generative models. Similarly, we also introduce a novel variational framework for implicit generative models of data, called VEEGAN. This framework allows for doing inference in statistical models where unlike the Bayesian generative models, a prescribed likelihood function is not available. It makes use of a discriminator based density ratio estimator (Sugiyama et al., 2012) to deal with the intractability of the likelihood function. Implicit generative models such as the generative adversarial networks (Goodfellow et al., 2014) suffer from learning issues like mode collapse (Srivastava et al., 2017) and training instability (Arjovsky et al., 2017). We tackle the mode collapse in GANs using VEE-

[1]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[2]  Lucas Theis,et al.  Amortised MAP Inference for Image Super-resolution , 2016, ICLR.

[3]  Wei Li,et al.  Nonparametric Bayes Pachinko Allocation , 2007, UAI.

[4]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[5]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[6]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[7]  Thore Graepel,et al.  Kernel Topic Models , 2011, AISTATS.

[8]  David M. Mimno,et al.  Reconstructing Pompeian Households , 2011, UAI.

[9]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[10]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[11]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[12]  Sanjeev Arora,et al.  Do GANs Actually Learn the Distribution , 2018 .

[13]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[14]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[16]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[17]  P. Diggle,et al.  Monte Carlo Methods of Inference for Implicit Statistical Models , 1984 .

[18]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[19]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[20]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[21]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[22]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[23]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[24]  Brandon M. Turner,et al.  A tutorial on approximate Bayesian computation , 2012 .

[25]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[26]  David M. Blei,et al.  The Generalized Reparameterization Gradient , 2016, NIPS.

[27]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[28]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[29]  Yu Cheng,et al.  Sobolev GAN , 2017, ICLR.

[30]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[31]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[32]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[33]  Dustin Tran,et al.  Edward: A library for probabilistic modeling, inference, and criticism , 2016, ArXiv.

[34]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[35]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[36]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[39]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[40]  Colin Campbell,et al.  The latent process decomposition of cDNA microarray data sets , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[42]  Yoshua Bengio,et al.  Boundary-Seeking Generative Adversarial Networks , 2017, ICLR 2017.

[43]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[44]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[45]  Jakob H. Macke,et al.  Flexible statistical inference for mechanistic models of neural dynamics , 2017, NIPS.

[46]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[47]  Noah A Rosenberg,et al.  AABC: approximate approximate Bayesian computation for inference in population-genetic models. , 2015, Theoretical population biology.

[48]  Dustin Tran,et al.  Hierarchical Implicit Models and Likelihood-Free Variational Inference , 2017, NIPS.

[49]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[50]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[51]  Jukka Corander,et al.  Likelihood-Free Inference by Ratio Estimation , 2016, Bayesian Analysis.

[52]  Sebastian Nowozin,et al.  Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks , 2017, ICML.

[53]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[54]  Didrik Nielsen,et al.  Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam , 2018, ICML.

[55]  Aki Vehtari,et al.  ELFI: Engine for Likelihood Free Inference , 2016, J. Mach. Learn. Res..

[56]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[57]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[58]  D. Knill,et al.  The Bayesian brain: the role of uncertainty in neural coding and computation , 2004, Trends in Neurosciences.

[59]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[60]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[61]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[62]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[63]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[64]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[65]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[66]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[67]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[68]  Mohammad Emtiyaz Khan,et al.  Conjugate-Computation Variational Inference: Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models , 2017, AISTATS.

[69]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[70]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[71]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[72]  Andrew Gordon Wilson,et al.  Bayesian GAN , 2017, NIPS.

[73]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[74]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[75]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[76]  J. Dickey Multiple Hypergeometric Functions: Probabilistic Interpretations and Statistical Uses , 1983 .

[77]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[78]  Scott W. Linderman,et al.  Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms , 2016, AISTATS.

[79]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[80]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[81]  Motoaki Kawanabe,et al.  Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search , 2011, Neural Networks.

[82]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[83]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[84]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[85]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[86]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[87]  David J. C. MacKay,et al.  Choice of Basis for Laplace Approximation , 1998, Machine Learning.

[88]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[89]  Shakir Mohamed,et al.  Implicit Reparameterization Gradients , 2018, NeurIPS.

[90]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[91]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[92]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[93]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[94]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[95]  Charles A. Sutton,et al.  VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning , 2017, NIPS.

[96]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[97]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[98]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[99]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[100]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[101]  Charles A. Sutton,et al.  Autoencoding Variational Inference For Topic Models , 2017, ICLR.

[102]  Zoubin Ghahramani,et al.  Turing: A Language for Flexible Probabilistic Inference , 2018 .

[103]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[104]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[105]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[106]  Alex Graves,et al.  Associative Compression Networks , 2018 .

[107]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[108]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[109]  Wei Li,et al.  Mixtures of hierarchical topics with Pachinko allocation , 2007, ICML '07.

[110]  Junichiro Hirayama,et al.  Bregman divergence as general framework to estimate unnormalized statistical models , 2011, UAI.