Principled Training of Neural Networks with Direct Feedback Alignment

The backpropagation algorithm has long been the canonical training method for neural networks. Modern paradigms are implicitly optimized for it, and numerous guidelines exist to ensure its proper use. Recently, synthetic gradients methods -where the error gradient is only roughly approximated - have garnered interest. These methods not only better portray how biological brains are learning, but also open new computational possibilities, such as updating layers asynchronously. Even so, they have failed to scale past simple tasks like MNIST or CIFAR-10. This is in part due to a lack of standards, leading to ill-suited models and practices forbidding such methods from performing to the best of their abilities. In this work, we focus on direct feedback alignment and present a set of best practices justified by observations of the alignment angles. We characterize a bottleneck effect that prevents alignment in narrow layers, and hypothesize it may explain why feedback alignment methods have yet to scale to large convolutional networks.

[1]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Arijit Raychowdhury,et al.  Direct Feedback Alignment With Sparse Connections for Local Learning , 2019, Front. Neurosci..

[4]  Thomas Hofmann,et al.  Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization , 2018, AISTATS.

[5]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[7]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[8]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[11]  Javier R. Movellan,et al.  Contrastive Hebbian Learning in the Continuous Hopfield Model , 1991 .

[12]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[13]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[14]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[16]  Geoffrey E. Hinton,et al.  Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures , 2018, NeurIPS.

[17]  Arild Nøkland,et al.  Training Neural Networks with Local Error Signals , 2019, ICML.

[18]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[19]  Razvan Pascanu,et al.  Sobolev Training for Neural Networks , 2017, NIPS.

[20]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[21]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[22]  Yoshua Bengio,et al.  How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation , 2014, ArXiv.

[23]  Y. L. Cun Learning Process in an Asymmetric Threshold Network , 1986 .

[24]  Alexander J. Smola,et al.  Fastfood - Computing Hilbert Space Expansions in loglinear time , 2013, ICML.

[25]  Peter C. Humphreys,et al.  Deep Learning without Weight Transport , 2019, NeurIPS.

[26]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[27]  Joel Z. Leibo,et al.  How Important Is Weight Symmetry in Backpropagation? , 2015, AAAI.

[28]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[29]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[30]  Tomaso A. Poggio,et al.  Biologically-plausible learning algorithms can scale to large datasets , 2018, ICLR.

[31]  Misha Denil,et al.  ACDC: A Structured Efficient Linear Layer , 2015, ICLR.