Characterizing emergent representations in a space of candidate learning rules for deep networks

How are sensory representations learned via experience? Deep learning offers a theoretical toolkit for studying how neural codes emerge under different learning rules. Studies suggesting that representations in deep networks resemble those in biological brains have mostly relied on one specific learning rule: gradient descent, the workhorse behind modern deep learning. However, it remains unclear how robust these emergent representations in deep networks are to this specific choice of learning algorithm. Here we present a continuous two-dimensional space of candidate learning rules, parameterized by levels of top-down feedback and Hebbian learning. We show that this space contains five important candidate learning algorithms as specific points–Gradient Descent, Contrastive Hebbian, quasi-Predictive Coding, Hebbian & Anti-Hebbian. Next, we exhaustively characterize the properties of each rule during learning about hierarchically structured data, and identify zones within this space where deep networks exhibit qualitative signatures of biological learning. We find that while a large set of algorithms achieve zero training error at convergence, only a subset show hallmarks of human semantic development like progressive differentiation and illusory correlations. Further, only a subset adjust intermediate neural representations toward task-relevant representations, indicative of backpropagation-like behavior. Finally, we show that algorithms can dramatically differ in their learned neural representations and dynamics, providing experimentally testable hallmarks of different learning principles. Our findings provide a framework linking diverse neural representational geometries to learning principles which can guide future experiments, and offer evidence about the learning rules likely to be at work in biology.

[1]  Adam Santoro,et al.  Backpropagation and the brain , 2020, Nature Reviews Neuroscience.

[2]  Andrew M. Saxe,et al.  If deep learning is the answer, then what is the question? , 2020, 2004.07580.

[3]  Surya Ganguli,et al.  Statistical Mechanics of Deep Learning , 2020, Annual Review of Condensed Matter Physics.

[4]  Grace W. Lindsay Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future , 2020, Journal of Cognitive Neuroscience.

[5]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[6]  James C. R. Whittington,et al.  Theories of Error Back-Propagation in the Brain , 2019, Trends in Cognitive Sciences.

[7]  Jaehoon Lee,et al.  Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[8]  Ruosong Wang,et al.  Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.

[9]  Francis Bach,et al.  On Lazy Training in Differentiable Programming , 2018, NeurIPS.

[10]  Surya Ganguli,et al.  A mathematical theory of semantic development in deep neural networks , 2018, Proceedings of the National Academy of Sciences.

[11]  Georg B. Keller,et al.  Predictive Processing: A Canonical Cortical Computation , 2018, Neuron.

[12]  John J. Hopfield,et al.  Unsupervised learning by competing hidden units , 2018, Proceedings of the National Academy of Sciences.

[13]  Arthur Jacot,et al.  Neural Tangent Kernel: Convergence and Generalization in Neural Networks , 2018, NeurIPS.

[14]  David Cox,et al.  A neural network trained to predict future video frames mimics critical properties of biological neuronal responses and perception , 2018, ArXiv.

[15]  Aaron R. Seitz,et al.  Deep Neural Networks for Modeling Visual Perceptual Learning , 2018, The Journal of Neuroscience.

[16]  Daniel L. K. Yamins,et al.  A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy , 2018, Neuron.

[17]  Laurence Aitchison,et al.  With or without you: predictive coding and Bayesian inference in the brain , 2017, Current Opinion in Neurobiology.

[18]  Rafal Bogacz,et al.  An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity , 2017, Neural Computation.

[19]  T. Rogers,et al.  The neural and computational bases of semantic cognition , 2016, Nature Reviews Neuroscience.

[20]  Konrad P. Körding,et al.  Toward an Integration of Deep Learning and Neuroscience , 2016, bioRxiv.

[21]  P. Baldi,et al.  A theory of local learning, the learning channel, and the optimality of backpropagation , 2015, Neural Networks.

[22]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[23]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[24]  C. Summerfield,et al.  Grounding predictive coding models in empirical neuroscience research. , 2013, The Behavioral and brain sciences.

[25]  H. Scharfman,et al.  Pattern separation in the dentate gyrus: A role for the CA3 backprojection , 2011, Hippocampus.

[26]  T. Burge,et al.  What is the significance of The Origin of Concepts for philosophers' and psychologists' theories of concepts? , 2011, Behavioral and Brain Sciences.

[27]  Keiji Tanaka,et al.  Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey , 2008, Neuron.

[28]  K. Jellinger Processes of Change in Brain and Cognitive Development Attention and Performance XXI , 2007 .

[29]  M. Moser,et al.  Pattern Separation in the Dentate Gyrus and CA3 of the Hippocampus , 2007, Science.

[30]  B. Dosher,et al.  The dynamics of perceptual learning: an incremental reweighting model. , 2005, Psychological review.

[31]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[32]  S. Hochstein,et al.  The reverse hierarchy theory of visual perceptual learning , 2004, Trends in Cognitive Sciences.

[33]  James L. McClelland,et al.  Semantic Cognition: A Parallel Distributed Processing Approach , 2004 .

[34]  Xiaohui Xie,et al.  Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network , 2003, Neural Computation.

[35]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[36]  G. Orban,et al.  Practising orientation identification improves orientation coding in V1 neurons , 2001, Nature.

[37]  R. O’Reilly,et al.  Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain , 2000 .

[38]  Z L Lu,et al.  Perceptual learning reflects external noise filtering and internal noise reduction through channel reweighting. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[39]  R. O’Reilly Six principles for biologically based computational models of cortical cognition , 1998, Trends in Cognitive Sciences.

[40]  J. Mandler,et al.  Concept formation in infancy , 1993 .

[41]  Peter M. Todd,et al.  Learning and connectionist representations , 1993 .

[42]  E. Oja Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[43]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[44]  Terrence J. Sejnowski,et al.  Competitive Anti-Hebbian Learning of Invariants , 1991, NIPS.

[45]  Pierre Baldi,et al.  Contrastive Learning and Neural Oscillations , 1991, Neural Computation.

[46]  Y. Miyashita Neuronal correlate of visual associative long-term memory in the primate temporal cortex , 1988, Nature.

[47]  F. Keil Semantic and Conceptual Development: An Ontological Perspective , 1982 .

[48]  R. Siegler Three aspects of cognitive development , 1976, Cognitive Psychology.

[49]  Guillaume Hennequin,et al.  Exact natural gradient in deep linear networks and its application to the nonlinear case , 2018, NeurIPS.

[50]  Moser Edvard,et al.  Pattern Separation in the Dentate Gyrus , 2009 .

[51]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[52]  James L. McClelland Running head : HEBBIAN LEARNING How Far Can You Go with Hebbian Learning , and When Does it Lead you Astray ? , 2005 .

[53]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[54]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .