A Correspondence Between Normalization Strategies in Artificial and Biological Neural Networks

A fundamental challenge at the interface of machine learning and neuroscience is to uncover computational principles that are shared between artificial and biological neural networks. In deep learning, normalization methods, such as batch normalization, weight normalization, and their many variants, help to stabilize hidden unit activity and accelerate network training, and these methods have been called one of the most important recent innovations for optimizing deep networks. In the brain, homeostatic plasticity represents a set of mechanisms that also stabilize and normalize network activity to lie within certain ranges, and these mechanisms are critical for maintaining normal brain function. In this survey, we discuss parallels between artificial and biological normalization methods at four spatial scales: normalization of a single neuron’s activity, normalization of synaptic weights of a neuron, normalization of a layer of neurons, and normalization of a network of neurons. We argue that both types of methods are functionally equivalent — i.e., they both push activation patterns of hidden units towards a homeostatic state, where all neurons are equally used — and that such representations can increase coding capacity, discrimination, and regularization. As a proof of concept, we develop a neural normalization algorithm, inspired by a phenomena called synaptic scaling, and show that this algorithm performs competitively against existing normalization methods on several datasets. Overall, we hope this connection will inspire machine learners in three ways: to uncover new normalization algorithms based on established neurobiological principles; to help quantify the trade-offs of different homeostatic plasticity mechanisms used in the brain; and to offer insights about how stability may not hinder, but may actually promote, plasticity.

[1]  Razvan Pascanu,et al.  Natural Neural Networks , 2015, NIPS.

[2]  Eero P. Simoncelli,et al.  Efficient Sensory Encoding and Bayesian Inference with Heterogeneous Neural Populations , 2014, Neural Computation.

[3]  Robert A. Frazor,et al.  Independence of luminance and contrast in natural scenes and in the early visual system , 2005, Nature Neuroscience.

[4]  Daniel Cremers,et al.  Regularization for Deep Learning: A Taxonomy , 2017, ArXiv.

[5]  Shawn R. Olsen,et al.  Divisive Normalization in Olfactory Population Codes , 2010, Neuron.

[6]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[7]  A. Wanner,et al.  Whitening of odor representations by the wiring diagram of the olfactory bulb , 2019, Nature Neuroscience.

[8]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[9]  Venu Govindaraju,et al.  Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.

[10]  Wulfram Gerstner,et al.  Integrating Hebbian and homeostatic plasticity: the current state of the field and future research directions , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[11]  Idan Segev,et al.  Two opposing plasticity mechanisms pulling a single synapse , 2008, Trends in Neurosciences.

[12]  I. Nelken,et al.  Interplay between population firing stability and single neuron dynamics in hippocampal networks , 2015, eLife.

[13]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[14]  Ruslan Salakhutdinov,et al.  Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.

[15]  Geoffrey E. Hinton,et al.  Dimensionality Reduction and Prior Knowledge in E-Set Recognition , 1989, NIPS.

[16]  G. Turrigiano The dialectic of Hebb and homeostasis , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[17]  Terrence J Sejnowski,et al.  Communication in Neuronal Networks , 2003, Science.

[18]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[19]  Amy L. Shelton,et al.  Reduction of Hippocampal Hyperactivity Improves Cognition in Amnestic Mild Cognitive Impairment , 2012, Neuron.

[20]  Maxim Bazhenov,et al.  Biologically inspired sleep algorithm for artificial neural networks , 2019, ArXiv.

[21]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[24]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[25]  S. Royer,et al.  Conservation of total synaptic weight through balanced synaptic depression and potentiation , 2003, Nature.

[26]  Carla P. Gomes,et al.  Understanding Batch Normalization , 2018, NeurIPS.

[27]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[28]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[29]  Ping Luo,et al.  Towards Understanding Regularization in Batch Normalization , 2018, ICLR.

[30]  G. Turrigiano Homeostatic synaptic plasticity: local and global mechanisms for stabilizing neuronal function. , 2012, Cold Spring Harbor perspectives in biology.

[31]  S. Laughlin A Simple Coding Procedure Enhances a Neuron's Information Capacity , 1981, Zeitschrift fur Naturforschung. Section C, Biosciences.

[32]  Yiming Li,et al.  Transformation of odor selectivity from projection neurons to single mushroom body neurons mapped with dual-color calcium imaging , 2013, Proceedings of the National Academy of Sciences.

[33]  Adam Santoro,et al.  Backpropagation and the brain , 2020, Nature Reviews Neuroscience.

[34]  Sanjeev Arora,et al.  Theoretical Analysis of Auto Rate-Tuning by Batch Normalization , 2018, ICLR.

[35]  Alfredo Fontanini,et al.  Network homeostasis: a matter of coordination , 2009, Current Opinion in Neurobiology.

[36]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[37]  W. Gerstner,et al.  Temporal whitening by power-law adaptation in neocortical neurons , 2013, Nature Neuroscience.

[38]  Kevin Fox,et al.  Integrating Hebbian and homeostatic plasticity: introduction , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[39]  Tapani Raiko,et al.  Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.

[40]  G. Buzsáki,et al.  The log-dynamic brain: how skewed distributions affect network operations , 2014, Nature Reviews Neuroscience.

[41]  G. Davis Homeostatic control of neural activity: from phenomenology to molecular design. , 2006, Annual review of neuroscience.

[42]  G. Tononi,et al.  Time to Be SHY? Some Comments on Sleep and Synaptic Homeostasis , 2012, Neural plasticity.

[43]  Alison I Weber,et al.  Coding Principles in Adaptation. , 2019, Annual review of vision science.

[44]  Neil C. Rabinowitz,et al.  Contrast Gain Control in Auditory Cortex , 2011, Neuron.

[45]  G. Turrigiano The Self-Tuning Neuron: Synaptic Scaling of Excitatory Synapses , 2008, Cell.

[46]  Maxim Bazhenov,et al.  Homeostatic role of heterosynaptic plasticity: models and experiments , 2015, Front. Comput. Neurosci..

[47]  C. Stevens A statistical property of fly odor responses is conserved across odors , 2016, Proceedings of the National Academy of Sciences.

[48]  D. Dickman,et al.  Emerging links between homeostatic synaptic plasticity and neurological disease , 2013, Front. Cell. Neurosci..

[49]  Zhuo Wang,et al.  "Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L_p$ Loss" , 2012, NIPS.

[50]  R. Shapley Retinal physiology: Adapting to the changing scene , 1997, Current Biology.

[51]  Eero P. Simoncelli,et al.  Implicit encoding of prior probabilities in optimal neural populations , 2010, NIPS.

[52]  G. Turrigiano Too many cooks? Intrinsic and synaptic homeostatic mechanisms in cortical circuit refinement. , 2011, Annual review of neuroscience.

[53]  T. Sejnowski,et al.  Homeostatic synaptic plasticity can explain post-traumatic epileptogenesis in chronically isolated neocortex. , 2005, Cerebral cortex.

[54]  G. Davis Homeostatic Signaling and the Stabilization of Neural Function , 2013, Neuron.

[55]  Charles F Stevens,et al.  What the fly’s nose tells the fly’s brain , 2015, Proceedings of the National Academy of Sciences.

[56]  L. Buck,et al.  Combinatorial Receptor Codes for Odors , 1999, Cell.

[57]  Gina G. Turrigiano,et al.  All for One But Not One for All: Excitatory Synaptic Scaling and Intrinsic Excitability Are Coregulated by CaMKIV, Whereas Inhibitory Synaptic Scaling Is Under Independent Control , 2017, The Journal of Neuroscience.

[58]  Hong Yu,et al.  Role of hyperactive cerebellum and motor cortex in Parkinson's disease , 2007, NeuroImage.

[59]  Chengjie G Huang,et al.  Temporal decorrelation by SK channels enables efficient neural coding and perception of natural stimuli , 2016, Nature Communications.

[60]  D. Linden,et al.  The other side of the engram: experience-driven changes in neuronal intrinsic excitability , 2003, Nature Reviews Neuroscience.

[61]  Daniel J. Graham,et al.  Can the theory of “whitening” explain the center-surround properties of retinal ganglion cell receptive fields? , 2006, Vision Research.

[62]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[63]  S. Nelson,et al.  Homeostatic plasticity in the developing nervous system , 2004, Nature Reviews Neuroscience.

[64]  Qiang Liu,et al.  Implicit Regularization of Normalization Methods , 2019, ArXiv.

[65]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[66]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[67]  Zhuo Wang,et al.  Efficient Neural Codes That Minimize Lp Reconstruction Error , 2016, Neural Computation.

[68]  G. Zajicek,et al.  The Wisdom of the Body , 1934, Nature.

[69]  Tomaso Poggio,et al.  Complexity control by gradient descent in deep networks , 2020, Nature Communications.

[70]  Lacey J. Kitch,et al.  Long-term dynamics of CA1 hippocampal place codes , 2013, Nature Neuroscience.

[71]  E. Marder,et al.  Similar network activity from disparate circuit parameters , 2004, Nature Neuroscience.

[72]  Y. Goda,et al.  Unraveling Mechanisms of Homeostatic Synaptic Plasticity , 2010, Neuron.

[73]  R. W. Rodieck The First Steps in Seeing , 1998 .

[74]  M. Meister,et al.  Decorrelation and efficient coding by retinal ganglion cells , 2012, Nature Neuroscience.

[75]  Thomas Hofmann,et al.  Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization , 2018, AISTATS.

[76]  W. Gerstner,et al.  Hebbian plasticity requires compensatory processes on multiple timescales , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[77]  E. Marder,et al.  Variability, compensation and homeostasis in neuron and network function , 2006, Nature Reviews Neuroscience.

[78]  T. Schikorski,et al.  Inactivity Produces Increases in Neurotransmitter Release and Synapse Size , 2001, Neuron.

[79]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.