Restricted Boltzmann Machines as Models of Interacting Variables

Abstract We study the type of distributions that restricted Boltzmann machines (RBMs) with different activation functions can express by investigating the effect of the activation function of the hidden nodes on the marginal distribution they impose on observed binary nodes. We report an exact expression for these marginals in the form of a model of interacting binary variables with the explicit form of the interactions depending on the hidden node activation function. We study the properties of these interactions in detail and evaluate how the accuracy with which the RBM approximates distributions over binary variables depends on the hidden node activation function and the number of hidden nodes. When the inferred RBM parameters are weak, an intuitive pattern is found for the expression of the interaction terms, which reduces substantially the differences across activation functions. We show that the weak parameter approximation is a good approximation for different RBMs trained on the MNIST data set. Interestingly, in these cases, the mapping reveals that the inferred models are essentially low order interaction models.

[1]  Frits C. R. Spieksma,et al.  Boltzmann Machines , 1995, Artificial Neural Networks.

[2]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[3]  Günther Palm,et al.  Detecting higher-order interactions among the spiking events in a group of neurons , 1995, Biological Cybernetics.

[4]  Razvan Pascanu,et al.  Learning Algorithms for the Classification Restricted Boltzmann Machine , 2012, J. Mach. Learn. Res..

[5]  E. Sudarshan,et al.  Some generalizations of the Marcinkiewicz theorem and its implications to certain approximation schemes in many-particle physics , 1974 .

[6]  Thierry Mora,et al.  Blindfold learning of an accurate neural metric , 2017, Proceedings of the National Academy of Sciences.

[7]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[8]  Adriano Barra,et al.  Phase transitions in Restricted Boltzmann Machines with generic priors , 2016, Physical review. E.

[9]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning and Data Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[10]  Bruno A. Olshausen,et al.  Modeling Higher-Order Correlations within Cortical Microcolumns , 2014, PLoS Comput. Biol..

[11]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[12]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[13]  Ilya Sutskever,et al.  Parallelizable Sampling of Markov Random Fields , 2010, AISTATS.

[14]  Dale Schuurmans,et al.  Stochastic Neural Networks with Monotonic Activation Functions , 2016, AISTATS.

[15]  Alessandro Treves,et al.  Efficiency of local learning rules in threshold-linear associative networks , 2020 .

[16]  Adriano Barra,et al.  Phase Diagram of Restricted Boltzmann Machines and Generalised Hopfield Networks with Arbitrary Priors , 2017, Physical review. E.

[17]  Elena Agliari,et al.  Boltzmann Machines as Generalized Hopfield Networks: A Review of Recent Results and Outlooks , 2020, Entropy.

[18]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[19]  Gene H. Golub,et al.  Symmetric Tensors and Symmetric Tensor Rank , 2008, SIAM J. Matrix Anal. Appl..

[20]  Carlo Baldassi,et al.  Shaping the learning landscape in neural networks around wide flat minima , 2019, Proceedings of the National Academy of Sciences.

[21]  Christian Igel,et al.  Training restricted Boltzmann machines: An introduction , 2014, Pattern Recognit..

[22]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[23]  Yasser Roudi,et al.  Learning with hidden variables , 2015, Current Opinion in Neurobiology.

[24]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[25]  A. Barrat The p-spin spherical spin glass model , 1997, cond-mat/9701031.

[26]  Abhishek Panigrahi,et al.  Effect of Activation Functions on the Training of Overparametrized Neural Nets , 2019, ICLR.

[27]  Luca Berdondini,et al.  Modeling Retinal Ganglion Cell Population Activity with Restricted Boltzmann Machines , 2017, ArXiv.

[28]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[29]  P. Latham,et al.  Role of correlations in population coding , 2011, 1109.6524.

[30]  Jascha Sohl-Dickstein,et al.  Input Switched Affine Networks: An RNN Architecture Designed for Interpretability , 2016, ICML.

[31]  R. Zemel,et al.  On the Representational Efficiency of Restricted Boltzmann Machines , 2013, NIPS 2013.

[32]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[33]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[34]  Adriano Barra,et al.  On the equivalence of Hopfield networks and Boltzmann Machines , 2011, Neural Networks.

[35]  A. Pouget,et al.  Neural correlations, population coding and computation , 2006, Nature Reviews Neuroscience.

[36]  Simona Cocco,et al.  Learning Compositional Representations of Interacting Systems with Restricted Boltzmann Machines: Comparative Study of Lattice Proteins , 2019, Neural Computation.

[37]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[38]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[39]  Valter Tucci,et al.  A novel unsupervised analysis of electrophysiological signals reveals new sleep substages in mice , 2018, PLoS biology.

[40]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[41]  Rémi Monasson,et al.  Emergence of Compositional Representations in Restricted Boltzmann Machines , 2016, Physical review letters.

[42]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[43]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[44]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[45]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[46]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[47]  Nobuyuki Yoshioka,et al.  Transforming generalized Ising models into Boltzmann machines. , 2018, Physical review. E.

[48]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.