Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment.

Training deep neural networks on large-scale datasets requires significant hardware resources whose costs (even on cloud platforms) put them out of reach of smaller organizations, groups, and individuals. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. Furthermore, it requires researchers to continually develop various tricks, such as specialized weight initializations and activation functions, in order to ensure a stable parameter optimization. Our goal is to seek an effective, neuro-biologically-plausible alternative to backprop that can be used to train deep networks. In this paper, we propose a gradient-free learning procedure, recursive local representation alignment, for training large-scale neural architectures. Experiments with residual networks on CIFAR-10 and the large benchmark, ImageNet, show that our algorithm generalizes as well as backprop while converging sooner due to weight updates that are parallelizable and computationally less demanding. This is empirical evidence that a backprop-free algorithm can scale up to larger datasets.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  P. Földiák,et al.  Forming sparse representations by local anti-Hebbian learning , 1990, Biological Cybernetics.

[3]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[4]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[8]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Jian Wu,et al.  Learned Neural Iterative Decoding for Lossy Image Compression Systems , 2018, 2019 Data Compression Conference (DCC).

[11]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[12]  Zheng Xu,et al.  Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[13]  Trevor Bekolay,et al.  Simultaneous unsupervised and supervised learning of cognitive functions in biologically plausible spiking neural networks , 2013, CogSci.

[14]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[15]  Geoffrey E. Hinton,et al.  Learning Representations by Recirculation , 1987, NIPS.

[16]  Joachim M. Buhmann,et al.  Kickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks , 2014, AAAI.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Alexander G. Ororbia,et al.  The Sibling Neural Estimator: Improving Iterative Image Decoding with Gradient Communication , 2020, 2020 Data Compression Conference (DCC).

[19]  Chris Eliasmith,et al.  Fine-Tuning and the Stability of Recurrent Neural Networks , 2011, PloS one.

[20]  Yoshua Bengio,et al.  How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation , 2014, ArXiv.

[21]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[22]  Michael Eickenberg,et al.  Greedy Layerwise Learning Can Scale to ImageNet , 2018, ICML.

[23]  Geoffrey E. Hinton,et al.  Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures , 2018, NeurIPS.

[24]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[25]  José Carlos Príncipe,et al.  Deep Predictive Coding Networks , 2013, ICLR.

[26]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[27]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[28]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[29]  Alexander Ororbia,et al.  Biologically Motivated Algorithms for Propagating Local Target Representations , 2018, AAAI.

[30]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[31]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  John J. Hopfield,et al.  Unsupervised learning by competing hidden units , 2018, Proceedings of the National Academy of Sciences.

[34]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[35]  Yoshua Bengio,et al.  Dendritic cortical microcircuits approximate the backpropagation algorithm , 2018, NeurIPS.

[36]  David Sussillo,et al.  Random Walks: Training Very Deep Nonlinear Feed-Forward Networks with Smart Initialization , 2014, ArXiv.

[37]  Miguel Á. Carreira-Perpiñán,et al.  Distributed optimization of deeply nested systems , 2012, AISTATS.

[38]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[39]  Vishnu Naresh Boddeti,et al.  Perturbative Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Tomaso A. Poggio,et al.  Biologically-plausible learning algorithms can scale to large datasets , 2018, ICLR.

[41]  Michael Eickenberg,et al.  Decoupled Greedy Learning of CNNs , 2019, ICML.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Joel Z. Leibo,et al.  How Important Is Weight Symmetry in Backpropagation? , 2015, AAAI.

[45]  Arild Nøkland,et al.  Training Neural Networks with Local Error Signals , 2019, ICML.

[46]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[47]  Nasir Ahmad,et al.  GAIT-prop: A biologically plausible learning rule derived from backpropagation of error , 2020, NeurIPS.

[48]  Peter C. Humphreys,et al.  Deep Learning without Weight Transport , 2019, NeurIPS.