Deep learning via message passing algorithms based on belief propagation

Message-passing algorithms based on the Belief Propagation (BP) equations constitute a well-known distributed computational scheme. It is exact on tree-like graphical models and has also proven to be effective in many problems defined on graphs with loops (from inference to optimization, from signal processing to clustering). The BP-based scheme is fundamentally different from stochastic gradient descent (SGD), on which the current success of deep networks is based. In this paper, we present and adapt to mini-batch training on GPUs a family of BP-based message-passing algorithms with a reinforcement field that biases distributions towards locally entropic solutions. These algorithms are capable of training multi-layer neural networks with discrete weights and activations with performance comparable to SGD-inspired heuristics (BinaryNet) and are naturally well-adapted to continual learning. Furthermore, using these algorithms to estimate the marginals of the weights allows us to make approximate Bayesian predictions that have higher accuracy than point-wise solutions.

[1]  Friedrich Fraundorfer,et al.  Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[3]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[4]  R. Zecchina,et al.  Efficient supervised learning in networks with binary synapses , 2007, Proceedings of the National Academy of Sciences.

[5]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[6]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[7]  Riccardo Zecchina,et al.  Learning by message-passing in networks of discrete synapses , 2005, Physical review letters.

[8]  Carlo Baldassi,et al.  Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses. , 2015, Physical review letters.

[9]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[10]  Christian Borgs,et al.  Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes , 2016, Proceedings of the National Academy of Sciences.

[11]  Carlo Baldassi,et al.  Shaping the learning landscape in neural networks around wide flat minima , 2019, Proceedings of the National Academy of Sciences.

[12]  Sebastian Nowozin,et al.  Deterministic Variational Inference for Robust Bayesian Neural Networks , 2018, ICLR.

[13]  Sundeep Rangan,et al.  Vector approximate message passing , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[14]  Stefano Soatto,et al.  Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[15]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[16]  Max Welling,et al.  Neural Enhanced Belief Propagation on Factor Graphs , 2020, AISTATS.

[17]  Hossein Mobahi,et al.  Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.

[18]  M. Mézard Mean-field message-passing equations in the Hopfield model and its generalizations. , 2016, Physical review. E.

[19]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[20]  Stefano Ermon,et al.  Belief Propagation Neural Networks , 2020, NeurIPS.

[21]  Carlo Baldassi,et al.  Entropic gradient descent algorithms and wide flat minima , 2021, ICLR.

[22]  David Tse,et al.  Porcupine Neural Networks: Approximating Neural Network Landscapes , 2018, NeurIPS.

[23]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[24]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[25]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[26]  George Stamatescu,et al.  Critical initialisation in continuous approximations of binary neural networks , 2020, ICLR.

[27]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[28]  H. Bethe Statistical Theory of Superlattices , 1935 .

[29]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[30]  Damien Querlioz,et al.  Synaptic metaplasticity in binarized neural networks , 2020, Nature Communications.

[31]  Marylou Gabri'e Mean-field inference methods for neural networks , 2019, ArXiv.

[32]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[33]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[34]  M. Mézard,et al.  Spin Glass Theory And Beyond: An Introduction To The Replica Method And Its Applications , 1986 .

[35]  Florent Krzakala,et al.  Statistical physics of inference: thresholds and algorithms , 2015, ArXiv.

[36]  S. Kak Information, physics, and computation , 1996 .

[37]  Sundeep Rangan,et al.  Inference in Deep Networks in High Dimensions , 2017, 2018 IEEE International Symposium on Information Theory (ISIT).

[38]  R. Peierls On Ising's model of ferromagnetism , 1936, Mathematical Proceedings of the Cambridge Philosophical Society.

[39]  Florent Krzakala,et al.  Multi-layer generalized linear estimation , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[40]  Y. Tu,et al.  The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima , 2021, Proceedings of the National Academy of Sciences.

[41]  Carlo Baldassi,et al.  Learning may need only a few bits of synaptic precision. , 2016, Physical review. E.

[42]  L. Abbott,et al.  Cascade Models of Synaptically Stored Memories , 2005, Neuron.