Deep Exponential Families

We describe \textit{deep exponential families} (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. DEFs capture a hierarchy of dependencies between latent variables, and are easily generalized to many settings through exponential families. We perform inference using recent "black box" variational inference techniques. We then evaluate various DEFs on text and combine multiple DEFs into a model for pairwise recommendation data. In an extensive study, we show that going beyond one layer improves predictions for DEFs. We demonstrate that DEFs find interesting exploratory structure in large data sets, and give better predictive performance than state-of-the-art models.

[1]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[2]  H. Robbins A Stochastic Approximation Method , 1951 .

[3]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[4]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[5]  John F. Canny,et al.  GaP: a factor model for discrete data , 2004, SIGIR '04.

[6]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[7]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[8]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[9]  Benjamin M. Marlin,et al.  Collaborative Filtering: A Machine Learning Perspective , 2004 .

[10]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[11]  Aleks Jakulin,et al.  Applying Discrete PCA in Data Analysis , 2004, UAI.

[12]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[13]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[14]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[15]  Lawrence D. Brown Fundamentals of Statistical Exponential Families , 1987 .

[16]  Daniel Hernández-Lobato,et al.  Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation , 2013, J. Mach. Learn. Res..

[17]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  David M. Blei,et al.  Scalable Recommendation with Poisson Factorization , 2013, ArXiv.

[20]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[21]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[22]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[23]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[24]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[26]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[27]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[28]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[29]  Yoshua Bengio,et al.  Large-Scale Feature Learning With Spike-and-Slab Sparse Coding , 2012, ICML.

[30]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[31]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[32]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[33]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[34]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[36]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[37]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[38]  David Wingate,et al.  Automated Variational Inference in Probabilistic Programming , 2013, ArXiv.

[39]  Katherine A. Heller,et al.  Bayesian Exponential Family PCA , 2008, NIPS.

[40]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[41]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..