On the choice of metric in gradient-based theories of brain function

This is a PLOS Computational Biology Education paper. The idea that the brain functions so as to minimize certain costs pervades theoretical neuroscience. Because a cost function by itself does not predict how the brain finds its minima, additional assumptions about the optimization method need to be made to predict the dynamics of physiological quantities. In this context, steepest descent (also called gradient descent) is often suggested as an algorithmic principle of optimization potentially implemented by the brain. In practice, researchers often consider the vector of partial derivatives as the gradient. However, the definition of the gradient and the notion of a steepest direction depend on the choice of a metric. Because the choice of the metric involves a large number of degrees of freedom, the predictive power of models that are based on gradient descent must be called into question, unless there are strong constraints on the choice of the metric. Here, we provide a didactic review of the mathematics of gradient descent, illustrate common pitfalls of using gradient descent as a principle of brain function with examples from the literature, and propose ways forward to constrain the metric.

[1]  Gilles Wainrib,et al.  A Biological Gradient Descent for Prediction Through a Combination of STDP and Homeostatic Plasticity , 2012, Neural Computation.

[2]  Terrence J. Sejnowski,et al.  Blind separation and blind deconvolution: an information-theoretic approach , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Sander M. Bohte,et al.  Reducing the Variability of Neural Responses: A Computational Theory of Spike-Timing-Dependent Plasticity , 2007, Neural Computation.

[4]  M. Spivak Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calculus , 2019 .

[5]  Rafal Bogacz,et al.  A tutorial on the free-energy framework for modelling perception and learning , 2017, Journal of mathematical psychology.

[6]  W. Bialek Biophysics: Searching for Principles , 2012 .

[7]  P. Holmes,et al.  Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields , 1983, Applied Mathematical Sciences.

[8]  Shun-ichi Amari,et al.  Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons , 1998, Neural Computation.

[9]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[10]  Konrad P. Körding,et al.  Toward an Integration of Deep Learning and Neuroscience , 2016, bioRxiv.

[11]  Charles Terence Clegg Wall,et al.  A Remark on Gradient Dynamical Systems , 1972 .

[12]  Luca Ambrogioni,et al.  Wasserstein Variational Inference , 2018, NeurIPS.

[13]  Sander M. Bohte,et al.  Reducing Spike Train Variability: A Computational Theory Of Spike-Timing Dependent Plasticity , 2004, BNAIC.

[14]  Christian P. Robert,et al.  On parameter estimation with the Wasserstein distance , 2017, Information and Inference: A Journal of the IMA.

[15]  John M. Lee Manifolds and Differential Geometry , 2009 .

[16]  Yann Ollivier,et al.  Riemannian metrics for neural networks II: recurrent networks and learning symbolic data sequences , 2013, 1306.0514.

[17]  Razvan Pascanu,et al.  Natural Neural Networks , 2015, NIPS.

[18]  Jing Yang,et al.  A supervised multi-spike learning algorithm based on gradient descent for spiking neural networks , 2013, Neural Networks.

[19]  C. T. C. Wall,et al.  Reflections on gradient vector fields , 1971 .

[20]  Christof Koch,et al.  How voltage-dependent conductances can adapt to maximize the information encoded by neuronal firing rate , 1999, Nature Neuroscience.

[21]  Jochen Triesch,et al.  Synergies Between Intrinsic and Synaptic Plasticity Mechanisms , 2007, Neural Computation.

[22]  H. Sompolinsky,et al.  The tempotron: a neuron that learns spike timing–based decisions , 2006, Nature Neuroscience.

[23]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[25]  Yee Whye Teh,et al.  Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server , 2015, J. Mach. Learn. Res..

[26]  Hieu Tat Nguyen,et al.  A gradient descent rule for spiking neurons emitting multiple spikes , 2005, Inf. Process. Lett..

[27]  J. Munkres,et al.  Calculus on Manifolds , 1965 .

[28]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[29]  Jean-Pascal Pfister,et al.  Optimal Spike-Timing-Dependent Plasticity for Precise Action Potential Firing in Supervised Learning , 2005, Neural Computation.

[30]  Simon McGregor,et al.  The free energy principle for action and perception: A mathematical review , 2017, 1705.09156.

[31]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[32]  S. Smale On Gradient Dynamical Systems , 1961 .

[33]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[34]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[35]  Yann Ollivier,et al.  Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.

[36]  Yee Whye Teh,et al.  Hamiltonian Descent Methods , 2018, ArXiv.

[37]  Olga V. Pochinka,et al.  Energy functions for dynamical systems , 2010 .

[38]  F. Wilson,et al.  Smoothing derivatives of functions and applications , 1969 .

[39]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[40]  W. Senn,et al.  Matching Recall and Storage in Sequence Learning with Spiking Neural Networks , 2013, The Journal of Neuroscience.

[41]  T. Vogels,et al.  Synaptic Transmission Optimization Predicts Expression Loci of Long-Term Plasticity , 2017, Neuron.

[42]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[43]  W. Rudin Principles of mathematical analysis , 1964 .

[44]  Klaus Neumann,et al.  Intrinsic plasticity via natural gradient descent with application to drift compensation , 2013, Neurocomputing.

[45]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[46]  P. Dayan,et al.  Matching storage and recall: hippocampal spike timing–dependent plasticity and phase response curves , 2005, Nature Neuroscience.

[47]  W. Senn,et al.  Learning by the Dendritic Prediction of Somatic Spiking , 2014, Neuron.

[48]  Sander M. Bohte,et al.  Error-backpropagation in temporally encoded networks of spiking neurons , 2000, Neurocomputing.

[49]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[50]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[51]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[52]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.