On the role of synaptic stochasticity in training low-precision neural networks

Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses. Both analytical and numerical results are presented. An algorithmic extension that allows to train discrete deep neural networks is also investigated.

[1]  Christian Borgs,et al.  Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes , 2016, Proceedings of the National Academy of Sciences.

[2]  Ole Winther,et al.  Optimal perceptron learning: as online Bayesian approach , 1999 .

[3]  H. Horner Dynamics of learning for the binary perceptron problem , 1992 .

[4]  G. Parisi,et al.  The simplest model of jamming , 2015, 1501.03397.

[5]  Riccardo Zecchina,et al.  Learning by message-passing in networks of discrete synapses , 2005, Physical review letters.

[6]  Carlo Baldassi Generalization Learning in a Perceptron with Binary Synapses , 2009, 1211.3024.

[7]  Heike Freud,et al.  On Line Learning In Neural Networks , 2016 .

[8]  Yonatan Loewenstein,et al.  Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity , 2006, Proceedings of the National Academy of Sciences.

[9]  Carlo Baldassi,et al.  Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses. , 2015, Physical review letters.

[10]  Carlo Baldassi,et al.  Local entropy as a measure for sampling solutions in Constraint Satisfaction Problems , 2015 .

[11]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[12]  Lenka Zdeborová,et al.  Constraint satisfaction problems with isolated solutions are hard , 2008, ArXiv.

[13]  Carlo Baldassi,et al.  Learning may need only a few bits of synaptic precision. , 2016, Physical review. E.

[14]  Monasson,et al.  Analytical and numerical study of internal representations in multilayer neural networks with binary weights. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[15]  Guilhem Semerjian,et al.  The large deviations of the whitening process in random constraint satisfaction problems , 2016, ArXiv.

[16]  Julia Deniz Yuret Knet : beginning deep learning with 100 lines of , 2016 .

[17]  E. Gardner The space of interactions in neural network models , 1988 .

[18]  Carlo Baldassi,et al.  A Max-Sum algorithm for training discrete neural networks , 2015, ArXiv.

[19]  Carlo Baldassi,et al.  Efficiency of quantum vs. classical annealing in nonconvex learning problems , 2017, Proceedings of the National Academy of Sciences.

[20]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[23]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[24]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[25]  W. Krauth,et al.  Storage capacity of memory networks with binary couplings , 1989 .

[26]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[27]  Yoshiyuki Kabashima,et al.  Origin of the computational hardness for learning with binary synapses , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Manfred Opper,et al.  A Bayesian approach to on-line learning , 1999 .

[29]  L. Goddard Information Theory , 1962, Nature.

[30]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[31]  E. Gardner,et al.  Optimal storage properties of neural network models , 1988 .

[32]  M. Mézard The space of interactions in neural networks: Gardner's computation with the cavity method , 1989 .

[33]  S. Wang,et al.  Graded bidirectional synaptic plasticity is composed of switch-like unitary events. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Christian Van den Broeck,et al.  Statistical Mechanics of Learning , 2001 .

[35]  G. Parisi,et al.  Recipes for metastable states in spin glasses , 1995 .

[36]  T. Sejnowski,et al.  Hippocampal Spine Head Sizes Are Highly Precise , 2015, bioRxiv.

[37]  Opper,et al.  Mean field approach to Bayes learning in feed-forward neural networks. , 1996, Physical review letters.