Front Contribution instead of Back Propagation

Deep Learning’s outstanding track record across several domains has stemmed from the use of error backpropagation (BP). Several studies, however, have shown that it is impossible to execute BP in a real brain. Also, BP still serves as an important and unsolved bottleneck for memory usage and speed. We propose a simple, novel algorithm, the Front-Contribution algorithm, as a compact alternative to BP. The contributions of all weights with respect to the final layer weights are calculated before training commences and all the contributions are appended to weights of the final layer, i.e., the effective final layer weights are a non-linear function of themselves. Our algorithm then essentially collapses the network, precluding the necessity for weight updation of all weights not in the final layer. This reduction in parameters results in lower memory usage and higher training speed. We show that our algorithm produces the exact same output as BP, in contrast to several recently proposed algorithms approximating BP. Our preliminary experiments demonstrate the efficacy of the proposed algorithm. Our work provides a foundation to effectively utilize these presently under-explored "front contributions", and serves to inspire the next generation of training algorithms.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  W. Bastiaan Kleijn,et al.  The HSIC Bottleneck: Deep Learning without Back-Propagation , 2019, AAAI.

[3]  Arild Nøkland,et al.  Training Neural Networks with Local Error Signals , 2019, ICML.

[4]  Manish Purohit,et al.  Efficient Rematerialization for Deep Networks , 2019, NeurIPS.

[5]  Raquel Urtasun,et al.  The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[6]  Alex Graves,et al.  Memory-Efficient Backpropagation Through Time , 2016, NIPS.

[7]  Daniel Kifer,et al.  Conducting Credit Assignment by Aligning Local Representations , 2018, 1803.01834.

[8]  Daniel Kifer,et al.  Reducing the Computational Burden of Deep Learning with Recursive Local Representation Alignment , 2020, ArXiv.

[9]  Konrad Paul Kording,et al.  Learning to solve the credit assignment problem , 2019, ICLR.

[10]  Yoshua Bengio,et al.  Dendritic cortical microcircuits approximate the backpropagation algorithm , 2018, NeurIPS.

[11]  Brian Kingsbury,et al.  Beyond Backprop: Online Alternating Minimization with Auxiliary Variables , 2018, ICML.

[12]  Hanning Zhou,et al.  Backprop-Q: Generalized Backpropagation for Stochastic Computation Graphs , 2018, ArXiv.

[13]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[14]  Michael W. Spratling,et al.  Target Propagation in Recurrent Neural Networks , 2020, J. Mach. Learn. Res..

[15]  Max Jaderberg,et al.  Understanding Synthetic Gradients and Decoupled Neural Interfaces , 2017, ICML.

[16]  Wenrui Zhang,et al.  Spike-Train Level Backpropagation for Training Deep Recurrent Spiking Neural Networks , 2019, NeurIPS.

[17]  Pierre Baldi,et al.  Learning in the Machine: Random Backpropagation and the Learning Channel , 2016, ArXiv.

[18]  G. Edelman,et al.  Large-scale model of mammalian thalamocortical systems , 2008, Proceedings of the National Academy of Sciences.

[19]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[20]  Gert Cauwenberghs,et al.  Deep Supervised Learning Using Local Errors , 2017, Front. Neurosci..

[21]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[22]  Michael Eickenberg,et al.  Greedy Layerwise Learning Can Scale to ImageNet , 2018, ICML.

[23]  Takuya Akiba,et al.  A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation , 2019, NeurIPS.

[24]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[25]  Bin Gu,et al.  Decoupled Parallel Backpropagation with Convergence Guarantee , 2018, ICML.

[26]  Alexander Ororbia,et al.  Biologically Motivated Algorithms for Propagating Local Target Representations , 2018, AAAI.

[27]  Takeru Miyato,et al.  Synthetic Gradient Methods with Virtual Forward-Backward Networks , 2017, ICLR.

[28]  Geoffrey E. Hinton,et al.  Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures , 2018, NeurIPS.

[29]  Yoshua Bengio,et al.  Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input , 2019, NeurIPS.

[30]  Hujun Yin,et al.  Breaking the Activation Function Bottleneck through Adaptive Parameterization , 2018, NeurIPS.

[31]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[32]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.