The Forward-Forward Algorithm: Some Preliminary Investigations

The aim of this paper is to introduce a new learning procedure for neural networks and to demonstrate that it works well enough on a few small problems to be worth further investigation. The Forward-Forward algorithm replaces the forward and backward passes of backpropagation by two forward passes, one with positive (i.e. real) data and the other with negative data which could be generated by the network itself. Each layer has its own objective function which is simply to have high goodness for positive data and low goodness for negative data. The sum of the squared activities in a layer can be used as the goodness but there are many other possibilities, including minus the sum of the squared activities. If the positive and negative passes could be separated in time, the negative passes could be done offline, which would make the learning much simpler in the positive pass and allow video to be pipelined through the network without ever storing activities or stopping to propagate derivatives.

[1]  Geoffrey E. Hinton,et al.  Scaling Forward Gradient With Local Losses , 2022, ICLR.

[2]  Adam A. Kohan,et al.  Signal Propagation: The Framework for Learning and Inference in a Forward Pass. , 2022, IEEE transactions on neural networks and learning systems.

[3]  R. Nock,et al.  Generative Trees: Adversarial and Copycat , 2022, ICML.

[4]  Geoffrey E. Hinton,et al.  How to Represent Part-Whole Hierarchies in a Neural Network , 2021, Neural Computation.

[5]  Geoffrey E. Hinton,et al.  Teaching with Commentaries , 2020, ICLR.

[6]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[7]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[8]  Yoshua Bengio,et al.  Training End-to-End Analog Neural Networks with Equilibrium Propagation , 2020, ArXiv.

[9]  T. Lillicrap,et al.  Backpropagation and the brain , 2020, Nature Reviews Neuroscience.

[10]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[11]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[12]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[14]  Bastiaan S. Veeling,et al.  Putting An End to End-to-End: Gradient-Isolated Learning of Representations , 2019, NeurIPS.

[15]  Timothy P Lillicrap,et al.  Dendritic solutions to the credit assignment problem , 2019, Current Opinion in Neurobiology.

[16]  Hava T. Siegelmann,et al.  Error Forward-Propagation: Reusing Feedforward Connections to Propagate Errors in Deep Learning , 2018, ArXiv.

[17]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[18]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[20]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[21]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[22]  Timothy P Lillicrap,et al.  Towards deep learning with segregated dendrites , 2016, eLife.

[23]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[24]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[25]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[28]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[29]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[30]  Yee Whye Teh,et al.  Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation , 2006, Cogn. Sci..

[31]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[32]  Geoffrey E. Hinton,et al.  Topographic Product Models Applied to Natural Scene Statistics , 2006, Neural Computation.

[33]  Max Welling,et al.  Extreme Components Analysis , 2003, NIPS.

[34]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[35]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[36]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[37]  Francis Crick,et al.  The function of dream sleep , 1983, Nature.

[38]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[39]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[40]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[41]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .

[42]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[43]  Marwan A. Jabri,et al.  Weight perturbation: an optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks , 1992, IEEE Trans. Neural Networks.