Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses.

We show that discrete synaptic weights can be efficiently used for learning in large scale neural systems, and lead to unanticipated computational performance. We focus on the representative case of learning random patterns with binary synapses in single layer networks. The standard statistical analysis shows that this problem is exponentially dominated by isolated solutions that are extremely hard to find algorithmically. Here, we introduce a novel method that allows us to find analytical evidence for the existence of subdominant and extremely dense regions of solutions. Numerical experiments confirm these findings. We also show that the dense regions are surprisingly accessible by simple learning protocols, and that these synaptic configurations are robust to perturbations and generalize better than typical solutions. These outcomes extend to synapses with multiple states and to deeper neural architectures. The large deviation measure also suggests how to design novel algorithmic schemes for optimization based on local entropy maximization.

[1]  W. Krauth,et al.  Storage capacity of memory networks with binary couplings , 1989 .

[2]  Sompolinsky,et al.  Learning from examples in large neural networks. , 1990, Physical review letters.

[3]  H. Horner Dynamics of learning for the binary perceptron problem , 1992 .

[4]  G. Parisi,et al.  Recipes for metastable states in spin glasses , 1995 .

[5]  S. Kak Information, physics, and computation , 1996 .

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Christian Van den Broeck,et al.  Statistical Mechanics of Learning , 2001 .

[8]  V. Akila,et al.  Information , 2001, The Lancet.

[9]  Rémi Monasson,et al.  Statistical mechanics methods and phase transitions in optimization problems , 2001, Theor. Comput. Sci..

[10]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[11]  S. Wang,et al.  Graded bidirectional synaptic plasticity is composed of switch-like unitary events. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13]  R. Zecchina,et al.  Efficient supervised learning in networks with binary synapses , 2007, Proceedings of the National Academy of Sciences.

[14]  Riccardo Zecchina,et al.  Entropy landscape and non-Gibbs solutions in constraint satisfaction problems , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Carlo Baldassi Generalization Learning in a Perceptron with Binary Synapses , 2009, 1211.3024.

[16]  Y. Kabashima,et al.  Weight space structure and analysis using a finite replica number in the Ising perceptron , 2009, 0910.2281.

[17]  M. Mézard,et al.  Random K-satisfiability , 2009 .

[18]  Cristopher Moore,et al.  The Nature of Computation , 2011 .

[19]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yoshiyuki Kabashima,et al.  Entropy landscape of solutions in the binary perceptron problem , 2013, ArXiv.

[21]  Yoshiyuki Kabashima,et al.  Origin of the computational hardness for learning with binary synapses , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  T. Sejnowski,et al.  Hippocampal Spine Head Sizes Are Highly Precise , 2015, bioRxiv.

[23]  Carlo Baldassi,et al.  A Max-Sum algorithm for training discrete neural networks , 2015, ArXiv.