Meta-Learning Bidirectional Update Rules

In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a twostate network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks have neither explicit notion of nor ever receive gradients. The synapses and neurons are updated using a bidirectional Hebb-style update rule parameterized by a shared low-dimensional “genome”. We show that such genomes can be meta-learned from scratch, using either conventional optimization techniques, or evolutionary strategies, such as CMA-ES. Resulting update rules generalize to unseen tasks and train faster than gradient descent based optimizers for several standard computer vision and synthetic tasks.

[1]  Marcus Liwicki,et al.  Bidirectional Learning for Robust Neural Networks , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[2]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[3]  Zheng Xu,et al.  Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[4]  Samy Bengio,et al.  On the search for new learning rules for ANNs , 1995, Neural Processing Letters.

[5]  J. Sohl-Dickstein,et al.  Reverse engineering learned optimizers reveals known and novel mechanisms , 2020, NeurIPS.

[6]  Philipp Hennig,et al.  Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers , 2020, ICML.

[7]  Jeff Clune,et al.  Neuromodulation Improves the Evolution of Forward Models , 2016, GECCO.

[8]  Kenneth O. Stanley,et al.  Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity , 2018, ICLR.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[11]  Yoshua Bengio,et al.  Towards Biologically Plausible Deep Learning , 2015, ArXiv.

[12]  Quoc V. Le,et al.  AutoML-Zero: Evolving Machine Learning Algorithms From Scratch , 2020, ICML.

[13]  Kenneth O. Stanley,et al.  Differentiable plasticity: training plastic neural networks with backpropagation , 2018, ICML.

[14]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Tomaso A. Poggio,et al.  Biologically-plausible learning algorithms can scale to large datasets , 2018, ICLR.

[17]  H. Robbins A Stochastic Approximation Method , 1951 .

[18]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[19]  Yuwen Xiong,et al.  LoCo: Local Contrastive Representation Learning , 2020, NeurIPS.

[20]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[21]  S. Risi,et al.  Meta-Learning through Hebbian Plasticity in Random Networks , 2020, NeurIPS.

[22]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[23]  Smooth Lyapunov 1-forms , 2003, math/0304137.

[24]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[25]  C. Conley Isolated Invariant Sets and the Morse Index , 1978 .

[26]  C. Conley The gradient structure of a flow: I , 1988, Ergodic Theory and Dynamical Systems.

[27]  Murray Shanahan,et al.  Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[28]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[29]  Peter C. Humphreys,et al.  Deep Learning without Weight Transport , 2019, NeurIPS.

[30]  Varun Ranganathan,et al.  ZORB: A Derivative-Free Backpropagation Algorithm for Neural Networks , 2020, ArXiv.

[31]  Bastiaan S. Veeling,et al.  Putting An End to End-to-End: Gradient-Isolated Learning of Representations , 2019, NeurIPS.

[32]  Ettore Randazzo,et al.  MPLP: Learning a Message Passing Learning Protocol , 2020, ArXiv.

[33]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[34]  L. Ambrogioni,et al.  GAIT-prop: A biologically plausible learning rule derived from backpropagation of error , 2020, NeurIPS.

[35]  J. Schmidhuber,et al.  Meta Learning Backpropagation And Improving It , 2020, NeurIPS.

[36]  Hervé Luga,et al.  Neuromodulated Learning in Deep Neural Networks , 2018, ArXiv.

[37]  Yoshua Bengio,et al.  How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation , 2014, ArXiv.

[38]  Jeff Clune,et al.  Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks , 2017, PloS one.

[39]  Jascha Sohl-Dickstein,et al.  Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves , 2020, ArXiv.

[40]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[41]  Everton J. Agnes,et al.  A meta-learning approach to (re)discover plasticity rules that carve a desired function into a neural network , 2020, bioRxiv.

[42]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Sebastian Risi,et al.  Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks , 2017, Neural Networks.

[45]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[46]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[47]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[48]  Jeremy Nixon,et al.  Understanding and correcting pathologies in the training of learned optimizers , 2018, ICML.

[49]  Michael Eickenberg,et al.  Greedy Layerwise Learning Can Scale to ImageNet , 2018, ICML.

[50]  Rolando Estrada,et al.  Continual Learning with Deep Artificial Neurons , 2020, ArXiv.

[51]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[52]  Notes on Chain Recurrence and Lyapunonv Functions , 2017, 1704.07264.

[53]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[54]  Bart Kosko,et al.  Bidirectional Backpropagation , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[55]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[56]  Ashok Litwin-Kumar,et al.  Learning to Learn with Feedback and Local Plasticity , 2020, NeurIPS.

[57]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[58]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.