A Theoretical Framework for Target Propagation

The success of deep learning, a brain-inspired form of AI, has sparked interest in understanding how the brain could similarly learn across multiple layers of neurons. However, the majority of biologically-plausible learning algorithms have not yet reached the performance of backpropagation (BP), nor are they built on strong theoretical foundations. Here, we analyze target propagation (TP), a popular but not yet fully understood alternative to BP, from the standpoint of mathematical optimization. Our theory shows that TP is closely related to Gauss-Newton optimization and thus substantially differs from BP. Furthermore, our analysis reveals a fundamental limitation of difference target propagation (DTP), a well-known variant of TP, in the realistic scenario of non-invertible neural networks. We provide a first solution to this problem through a novel reconstruction loss that improves feedback weight training, while simultaneously introducing architectural flexibility by allowing for direct feedback connections from the output to each hidden layer. Our theory is corroborated by experimental results that show significant improvements in performance and in the alignment of forward weight updates with loss gradients, compared to DTP.

[1]  Christian K. Machens,et al.  Biological credit assignment through dynamic inversion of feedforward networks , 2020, NeurIPS.

[2]  T. Lillicrap,et al.  Backpropagation and the brain , 2020, Nature Reviews Neuroscience.

[3]  Daniel L. K. Yamins,et al.  Two Routes to Scalable Credit Assignment without Weight Symmetry , 2020, ICML.

[4]  Michael W. Spratling,et al.  Target Propagation in Recurrent Neural Networks , 2020, J. Mach. Learn. Res..

[5]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[6]  Konrad Paul Kording,et al.  Spike-based causal inference for weight alignment , 2019, ICLR.

[7]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[8]  Florent Krzakala,et al.  Principled Training of Neural Networks with Direct Feedback Alignment , 2019, ArXiv.

[9]  Konrad Paul Kording,et al.  Learning to solve the credit assignment problem , 2019, ICLR.

[10]  Di He,et al.  A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems , 2019, ArXiv.

[11]  James Martens,et al.  Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks , 2019, NeurIPS.

[12]  Peter C. Humphreys,et al.  Deep Learning without Weight Transport , 2019, NeurIPS.

[13]  Arijit Raychowdhury,et al.  Direct Feedback Alignment With Sparse Connections for Local Learning , 2019, Front. Neurosci..

[14]  L. F. Abbott,et al.  Feedback alignment in deep convolutional networks , 2018, ArXiv.

[15]  Yoshua Bengio,et al.  Dendritic cortical microcircuits approximate the backpropagation algorithm , 2018, NeurIPS.

[16]  Daniel Kifer,et al.  Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Tomaso A. Poggio,et al.  Biologically-plausible learning algorithms can scale to large datasets , 2018, ICLR.

[18]  Ion Stoica,et al.  Tune: A Research Platform for Distributed Model Selection and Training , 2018, ArXiv.

[19]  Geoffrey E. Hinton,et al.  Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures , 2018, NeurIPS.

[20]  Alexander Ororbia,et al.  Biologically Motivated Algorithms for Propagating Local Target Representations , 2018, AAAI.

[21]  L. F. Abbott,et al.  full-FORCE: A target-based method for training recurrent networks , 2017, PloS one.

[22]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[23]  David Barber,et al.  Practical Gauss-Newton Optimisation for Deep Learning , 2017, ICML.

[24]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[25]  Timothy P Lillicrap,et al.  Towards deep learning with segregated dendrites , 2016, eLife.

[26]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[27]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[28]  James Martens Second-order Optimization for Neural Networks , 2016 .

[29]  L. F. Abbott,et al.  Building functional networks of spiking model neurons , 2016, Nature Neuroscience.

[30]  Sanjeev Arora,et al.  Why are deep nets reversible: A simple theory, with implications for training , 2015, ArXiv.

[31]  Joel Z. Leibo,et al.  How Important Is Weight Symmetry in Backpropagation? , 2015, AAAI.

[32]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[33]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[34]  Yoshua Bengio,et al.  Towards Biologically Plausible Deep Learning , 2015, ArXiv.

[35]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Yoshua Bengio,et al.  How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation , 2014, ArXiv.

[38]  Silouanos Brazitikos Geometry of Isotropic Convex Bodies , 2014 .

[39]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[40]  M. London,et al.  Sensitivity to perturbations in vivo implies high noise and suggests rate coding in cortex , 2010, Nature.

[41]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[42]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[43]  Robert Desimone,et al.  Cortical connections of area V4 in the macaque. , 2000, Cerebral cortex.

[44]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[45]  K. Rockland,et al.  Direct temporal-occipital feedback connections to striate cortex (V1) in the macaque monkey. , 1994, Cerebral cortex.

[46]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[47]  Yu He,et al.  Asymptotic Convergence of Backpropagation , 1989, Neural Computation.

[48]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[49]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[50]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[51]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[52]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[53]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[54]  Gregory D. Wayne,et al.  Self-Modeling Neural Systems , 2013 .

[55]  Y. Takane,et al.  Generalized Inverse Matrices , 2011 .

[56]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[57]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[58]  Sheng Chen,et al.  Parallel recursive prediction error algorithm for training layered neural networks , 1990 .

[59]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[60]  Yann Le Cun,et al.  A Theoretical Framework for Back-Propagation , 1988 .

[61]  Geoffrey E. Hinton,et al.  GEMINI: Gradient Estimation Through Matrix Inversion After Noise Injection , 1988, NIPS.

[62]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[63]  Yann LeCun,et al.  Learning processes in an asymmetric threshold network , 1986 .

[64]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[65]  Mario Bertero,et al.  The Stability of Inverse Problems , 1980 .

[66]  C. D. Meyer,et al.  Generalized inverses of linear transformations , 1979 .