论文信息 - Deep learning via message passing algorithms based on belief propagation

Deep learning via message passing algorithms based on belief propagation

Message-passing algorithms based on the Belief Propagation (BP) equations constitute a well-known distributed computational scheme. It is exact on tree-like graphical models and has also proven to be effective in many problems defined on graphs with loops (from inference to optimization, from signal processing to clustering). The BP-based scheme is fundamentally different from stochastic gradient descent (SGD), on which the current success of deep networks is based. In this paper, we present and adapt to mini-batch training on GPUs a family of BP-based message-passing algorithms with a reinforcement field that biases distributions towards locally entropic solutions. These algorithms are capable of training multi-layer neural networks with discrete weights and activations with performance comparable to SGD-inspired heuristics (BinaryNet) and are naturally well-adapted to continual learning. Furthermore, using these algorithms to estimate the marginals of the weights allows us to make approximate Bayesian predictions that have higher accuracy than point-wise solutions.

Carlo Lucibello | Riccardo Zecchina | Gabriele Perugini | Fabrizio Pittorino

[1] Friedrich Fraundorfer,et al. Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Judea Pearl,et al. Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[3] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.

[4] R. Zecchina,et al. Efficient supervised learning in networks with binary synapses , 2007, Proceedings of the National Academy of Sciences.

[5] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[6] Anthony V. Robins,et al. Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[7] Riccardo Zecchina,et al. Learning by message-passing in networks of discrete synapses , 2005, Physical review letters.

[8] Carlo Baldassi,et al. Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses. , 2015, Physical review letters.

[9] Marcus Rohrbach,et al. Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[10] Christian Borgs,et al. Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes , 2016, Proceedings of the National Academy of Sciences.

[11] Carlo Baldassi,et al. Shaping the learning landscape in neural networks around wide flat minima , 2019, Proceedings of the National Academy of Sciences.

[12] Sebastian Nowozin,et al. Deterministic Variational Inference for Robust Bayesian Neural Networks , 2018, ICLR.

[13] Sundeep Rangan,et al. Vector approximate message passing , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[14] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[15] Robert G. Gallager,et al. Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[16] Max Welling,et al. Neural Enhanced Belief Propagation on Factor Graphs , 2020, AISTATS.

[17] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.

[18] M. Mézard. Mean-field message-passing equations in the Hopfield model and its generalizations. , 2016, Physical review. E.

[19] Surya Ganguli,et al. Continual Learning Through Synaptic Intelligence , 2017, ICML.

[20] Stefano Ermon,et al. Belief Propagation Neural Networks , 2020, NeurIPS.

[21] Carlo Baldassi,et al. Entropic gradient descent algorithms and wide flat minima , 2021, ICLR.

[22] David Tse,et al. Porcupine Neural Networks: Approximating Neural Network Landscapes , 2018, NeurIPS.

[23] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[24] William T. Freeman,et al. Understanding belief propagation and its generalizations , 2003 .

[25] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[26] George Stamatescu,et al. Critical initialisation in continuous approximations of binary neural networks , 2020, ICLR.

[27] Andrew Gordon Wilson,et al. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[28] H. Bethe. Statistical Theory of Superlattices , 1935 .

[29] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[30] Damien Querlioz,et al. Synaptic metaplasticity in binarized neural networks , 2020, Nature Communications.

[31] Marylou Gabri'e. Mean-field inference methods for neural networks , 2019, ArXiv.

[32] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[33] Ryan P. Adams,et al. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[34] M. Mézard,et al. Spin Glass Theory And Beyond: An Introduction To The Replica Method And Its Applications , 1986 .

[35] Florent Krzakala,et al. Statistical physics of inference: thresholds and algorithms , 2015, ArXiv.

[36] S. Kak. Information, physics, and computation , 1996 .

[37] Sundeep Rangan,et al. Inference in Deep Networks in High Dimensions , 2017, 2018 IEEE International Symposium on Information Theory (ISIT).

[38] R. Peierls. On Ising's model of ferromagnetism , 1936, Mathematical Proceedings of the Cambridge Philosophical Society.

[39] Florent Krzakala,et al. Multi-layer generalized linear estimation , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[40] Y. Tu,et al. The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima , 2021, Proceedings of the National Academy of Sciences.

[41] Carlo Baldassi,et al. Learning may need only a few bits of synaptic precision. , 2016, Physical review. E.

[42] L. Abbott,et al. Cascade Models of Synaptically Stored Memories , 2005, Neuron.