Spectral Learning of Restricted Boltzmann Machines

The restricted Boltzmann machine, an important tool used in machine learning in particular for unsupervized learning tasks, is investigated from the perspective of its spectral properties. Based on empirical observations, we propose a generic statistical ensemble for the weight matrix of the RBM and characterize its mean evolution, with respect to common learning procedures as a function of some statistical properties of the data. In particular we identify the main unstable deformation modes of the weight matrix which emerge at the beginning of the learning and unveil in some way how these further interact in later stages of the learning procedure. Introduction. – Restricted Boltzmann machines (RBM) [1] constitutes nowadays a common tool on the shelf of machine learning practitioners. It is a generative model, in the sense that it defines a probability distribution, which can be learned to approximate any distribution of data points living in some N -dimensional space, with N potentially large. It also often constitutes a building block of more complex neural network models [2, 3]. The standard learning procedure called contrastive divergence [4] is well documented [5] although being still a not so well understood fine empirical art, with many hyperparameters to tune without much guidelines. At the same time an RBM can be regarded as a statistical physics model, being defined as a Boltzmann distribution with pairwise interactions on a bipartite graph. Similar models have been already the subject of many studies in the 80’s [6–9] which mainly concentrated on the learning capacity, i.e. the number of independent patterns that could be stored in such a model. The second life of neural networks has renewed the interest of statistical physicists for such models. Recent works actually propose to exploit its statistical physics formulation to define mean-field based learning methods using TAP equations [10–12]. Meanwhile some analysis of its static properties, assuming a given learned weight matrix W , have been proposed [13, 14] in order to understand collective phenomena in the latent representation [15], i.e. the way latent variables organize themselves (a)E-mail: aurelien.decelle@lri.fr to represent actual data. One common assumption made in these works is that the weights of W are i.i.d. which as we shall see is unrealistic. Concerning the learning procedure of neural network, many recent statistical physics based analysis have been proposed, most of them within teacher-student setting [16] which imposes a strong assumption on the data, namely that these are generated from a model belonging to the parametric family of interest, hiding as a consequence the role played by the data itself in the procedure. From the analysis of related models [17, 18], it is already a well established fact that a selection of the most important modes of the singular value decomposition (SVD) of the data is performed in the linear case. In fact in the simpler context of linear feed-forward models the learning dynamics can be fully characterized by means of the SVD of the data matrix [19], showing in particular the emergence of each mode by order of importance regarding singular values. In this work we follow this guideline in the context of a general RBM. We propose to characterize both the learned RBM and the learning process itself by the SVD spectrum of the weight matrix in order to isolate the information content of an RBM. This allows us then to write a deterministic learning equation leaving aside the fluctuations. This equation is subsequently analyzed first in the linear regime to identify the unstable deformation modes of W ; secondly at equilibrium assuming the learning is converging, in order to understand the nature of the non-linear p-1 ar X iv :1 70 8. 02 91 7v 1 [ co nd -m at .d is -n n] 9 A ug 2 01 7