Inductive Principles for Restricted Boltzmann Machine Learning

Recent research has seen the proposal of several new inductive principles designed specifically to avoid the problems associated with maximum likelihood learning in models with intractable partition functions. In this paper, we study learning methods for binary restricted Boltzmann machines (RBMs) based on ratio matching and generalized score matching. We compare these new RBM learning methods to a range of existing learning methods including stochastic maximum likelihood, contrastive divergence, and pseudo-likelihood. We perform an extensive empirical evaluation across multiple tasks and data sets.

[1]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[2]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[3]  H. Robbins A Stochastic Approximation Method , 1951 .

[4]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[5]  HighWire Press Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.

[6]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[7]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[8]  Aapo Hyvärinen,et al.  Some extensions of score matching , 2007, Comput. Stat. Data Anal..

[9]  L. Younes Parametric Inference for imperfectly observed Gibbsian fields , 1989 .

[10]  Siwei Lyu,et al.  Interpretation and Generalization of Score Matching , 2009, UAI.

[11]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[12]  Geoffrey E. Hinton,et al.  To recognize shapes, first learn to generate images. , 2007, Progress in brain research.

[13]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[14]  Martin J. Wainwright,et al.  Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching , 2003, AISTATS.

[15]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[16]  Ruslan Salakhutdinov,et al.  Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.

[17]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[18]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[19]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[20]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.