Learning the Structure of Deep Sparse Graphical Models

Deep belief networks are a powerful way to model complex probability distributions. However, it is difficult to learn the structure of a belief network, particularly one with hidden units. The Indian buffet process has been used as a nonparametric Bayesian prior on the structure of a directed belief network with a single infinitely wide hidden layer. Here, we introduce the cascading Indian buffet process (CIBP), which provides a prior on the structure of a layered, directed belief network that is unbounded in both depth and width, yet allows tractable inference. We use the CIBP prior with the nonlinear Gaussian belief network framework to allow each unit to vary its behavior between discrete and continuous representations. We use Markov chain Monte Carlo for inference in this model and explore the structures learned on image data.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  Brendan J. Frey,et al.  Variational Learning in Nonlinear Gaussian Belief Networks , 1999, Neural Computation.

[3]  Douglas Eck,et al.  An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism , 2009, NIPS.

[4]  Raymond J. Mooney,et al.  Theory Refinement of Bayesian Networks with Hidden Variables , 1998, ICML.

[5]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[6]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[7]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[8]  Michael I. Jordan,et al.  Sharing Features among Dynamical Systems with Beta Processes , 2009, NIPS.

[9]  Thomas L. Griffiths,et al.  A Non-Parametric Bayesian Method for Inferring Hidden Causes , 2006, UAI.

[10]  Brendan J. Frey,et al.  Continuous Sigmoidal Belief Networks Trained using Slice Sampling , 1996, NIPS.

[11]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[12]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[13]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[14]  E. Seneta,et al.  On quasi-stationary distributions in discrete-time Markov chains with a denumerable infinity of states , 1966 .

[15]  G. Fayolle,et al.  Topics in the Constructive Theory of Countable Markov Chains , 1995 .

[16]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[17]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[18]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[19]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[20]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[21]  Nir Friedman,et al.  Discovering Hidden Variables: A Structure-Based Approach , 2000, NIPS.

[22]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[23]  Geoffrey E. Hinton Reducing the Dimensionality of Data with Neural , 2008 .

[24]  Matthew J. Beal,et al.  Variational Bayesian learning of directed graphical models with hidden variables , 2006 .

[25]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[26]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[27]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[28]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[29]  T. Griffiths,et al.  Bayesian nonparametric latent feature models , 2007 .