Free energies of Boltzmann machines: self-averaging, annealed and replica symmetric approximations in the thermodynamic limit

Restricted Boltzmann machines (RBMs) constitute one of the main models for machine statistical inference and they are widely employed in artificial intelligence as powerful tools for (deep) learning. However, in contrast with countless remarkable practical successes, their mathematical formalization has been largely elusive: from a statistical-mechanics perspective these systems display the same (random) Gibbs measure of bi-partite spin-glasses, whose rigorous treatment is notoriously difficult. In this work, beyond providing a brief review on RBMs from both the learning and the retrieval perspectives, we aim to contribute to their analytical investigation, by considering two distinct realizations of their weights (i.e. Boolean and Gaussian) and studying the properties of their related free energies. More precisely, focusing on a RBM characterized by digital couplings, we first extend the Pastur–Shcherbina–Tirozzi method (originally developed for the Hopfield model) to prove the self-averaging property for the free energy, over its quenched expectation, in the infinite volume limit, then we explicitly calculate its simplest approximation, namely its annealed bound. Next, focusing on a RBM characterized by analogical weights, we extend Guerra’s interpolating scheme to obtain a control of the quenched free-energy under the assumption of replica symmetry (i.e. we require that the order parameters do not fluctuate in the thermodynamic limit): we get self-consistencies for the order parameters (in full agreement with the existing literature) as well as the critical line for ergodicity breaking that turns out to be the same obtained in AGS theory. As we discuss, this analogy stems from the slow-noise universality. Finally, glancing beyond replica symmetry, we analyze the fluctuations of the overlaps for a correct estimation of the (slow) noise affecting the retrieval of the signal, and by a stability analysis we recover the Aizenman–Contucci identities typical of glassy systems.

[1]  Daniele Tantari,et al.  Non-convex Multi-species Hopfield Models , 2018, Journal of Statistical Physics.

[2]  Giancarlo Fissore,et al.  Thermodynamics of Restricted Boltzmann Machines and Related Learning Dynamics , 2018, Journal of Statistical Physics.

[3]  Giancarlo Fissore,et al.  Spectral dynamics of learning in restricted Boltzmann machines , 2017 .

[4]  Jeff Heaton,et al.  Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning , 2017, Genetic Programming and Evolvable Machines.

[5]  Giancarlo Fissore,et al.  Spectral Learning of Restricted Boltzmann Machines , 2017, ArXiv.

[6]  Adriano Barra,et al.  Phase Diagram of Restricted Boltzmann Machines and Generalised Hopfield Networks with Arbitrary Priors , 2017, Physical review. E.

[7]  Adriano Barra,et al.  Phase transitions in Restricted Boltzmann Machines with generic priors , 2016, Physical review. E.

[8]  Rémi Monasson,et al.  Emergence of Compositional Representations in Restricted Boltzmann Machines , 2016, Physical review letters.

[9]  M. Mézard Mean-field message-passing equations in the Hopfield model and its generalizations. , 2016, Physical review. E.

[10]  Silvia Bartolucci,et al.  The role of idiotypic interactions in the adaptive immune system: a belief-propagation approach , 2016, 1605.01290.

[11]  A. Coolen,et al.  Statistical mechanics of clonal expansion in lymphocyte networks modelled with slow and fast variables , 2016, 1603.01328.

[12]  Michael Levin,et al.  Inferring Regulatory Networks from Experimental Morphological Phenotypes: A Computational Method Reverse-Engineers Planarian Regeneration , 2015, PLoS Comput. Biol..

[13]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[14]  A. Annibale,et al.  A dynamical model of the adaptive immune system: effects of cells promiscuity, antigens and B–B interactions , 2015, 1505.03785.

[15]  Florent Krzakala,et al.  Approximate message passing with restricted Boltzmann machine priors , 2015, ArXiv.

[16]  Haiping Huang,et al.  Advanced Mean Field Theory of Restricted Boltzmann Machine , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[18]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[19]  A. Barra,et al.  Multi-Species Mean Field Spin Glasses. Rigorous Results , 2013, 1307.5154.

[20]  D. Panchenko The Sherrington-Kirkpatrick Model , 2013 .

[21]  A. Barra,et al.  Anergy in self-directed B lymphocytes: A statistical mechanics perspective. , 2015, Journal of theoretical biology.

[22]  A. Barra,et al.  How glassy are neural networks? , 2012, 1205.3900.

[23]  Elena Agliari,et al.  Multitasking associative networks. , 2011, Physical review letters.

[24]  A. Barra,et al.  A thermodynamic perspective of immune capabilities. , 2011, Journal of theoretical biology.

[25]  Adriano Barra,et al.  On the equivalence of Hopfield networks and Boltzmann Machines , 2011, Neural Networks.

[26]  A. Barra,et al.  Interpolating the Sherrington–Kirkpatrick replica trick , 2011, 1104.2080.

[27]  G. Genovese Universality in bipartite mean field spin glasses , 2011, 1102.2535.

[28]  Cristopher Moore,et al.  Phase transition in the detection of modules in sparse networks , 2011, Physical review letters.

[29]  A. Barra,et al.  Equilibrium statistical mechanics of bipartite spin systems , 2010, 1012.1261.

[30]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[31]  A. Barra,et al.  The Replica Symmetric Approximation of the Analogical Neural Network , 2009, 0911.3096.

[32]  Haiping Huang Reconstructing the Hopfield network as an inverse Ising problem. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[34]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[35]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[36]  A. Bovier,et al.  Statistical Mechanics of Disordered Systems , 2006 .

[37]  Peter Sollich,et al.  Theory of Neural Information Processing Systems , 2005 .

[38]  Pisa,et al.  The infinite volume limit in generalized mean field disordered models , 2002, cond-mat/0208579.

[39]  F. Guerra Broken Replica Symmetry Bounds in the Mean Field Spin Glass Model , 2002, cond-mat/0205123.

[40]  F. Guerra,et al.  The Thermodynamic Limit in Mean Field Spin Glass Models , 2002, cond-mat/0204280.

[41]  M. Shcherbina,et al.  Rigorous Solution of the Gardner Problem , 2001, math-ph/0112003.

[42]  H. Nishimori Statistical Physics of Spin Glasses and Information Processing , 2001 .

[43]  M. Talagrand Exponential inequalities and convergence of moments in the replica-symmetric regime of the Hopfield model , 2000 .

[44]  L. Pastur,et al.  On the replica symmetric equations for the Hopfield model , 1999 .

[45]  F. Guerra,et al.  General properties of overlap probability distributions in disordered spin systems. Towards Parisi ultrametricity , 1998, cond-mat/9807333.

[46]  Hilbert J. Kappen,et al.  Efficient Learning in Boltzmann Machines Using Linear Response Theory , 1998, Neural Computation.

[47]  M. Talagrand Rigorous results for the Hopfield model with many patterns , 1998 .

[48]  M. Aizenman,et al.  On the Stability of the Quenched State in Mean-Field Spin-Glass Models , 1997, cond-mat/9712129.

[49]  A. Bovier,et al.  Gibbs states of the Hopfield model with extensively many patterns , 1995 .

[50]  A. Bovier,et al.  Gibbs states of the Hopfield model in the regime of perfect memory , 1994 .

[51]  M. Talagrand Concentration of measure and isoperimetric inequalities in product spaces , 1994, math/9406212.

[52]  L. Pastur,et al.  The replica-symmetric solution without replica trick for the Hopfield model , 1994 .

[53]  Brunello Tirozzi,et al.  The free energy of a class of Hopfield models , 1993 .

[54]  Emile H. L. Aarts,et al.  Combinatorial Optimization on a Boltzmann Machine , 1989, J. Parallel Distributed Comput..

[55]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[56]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[57]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[58]  B. M. Brown,et al.  Martingale Central Limit Theorems , 1971 .

[59]  Ute Dreher,et al.  Statistical Mechanics Rigorous Results , 2016 .

[60]  M. Talagrand,et al.  Spin Glasses: A Challenge for Mathematicians , 2003 .

[61]  H. Englisch,et al.  The B.A.M. Storage Capacity , 1995 .

[62]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[63]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..