Generalization in Interactive Networks: The Benefits of Inhibitory Competition and Hebbian Learning

Computational models in cognitive neuroscience should ideally use biological properties and powerful computational principles to produce behavior consistent with psychological findings. Error-driven backpropagation is computationally powerful and has proven useful for modeling a range of psychological data but is not biologically plausible. Several approaches to implementing backpropagation in a biologically plausible fashion converge on the idea of using bid irectional activation propagation in interactive networks to convey error signals. This article demonstrates two main points about these error-driven interactive networks: (1) they generalize poorly due to attractor dynamics that interfere with the network's ability to produce novel combinatorial representations systematically in response to novel inputs, and (2) this generalization problem can be remedied by adding two widely used mechanistic principles, inhibitory competition and Hebbian learning, that can be independently motivated for a variety of biological, psychological, and computational reasons. Simulations using the Leabra algorithm, which combines the generalized recirculation (GeneRec), biologically plausible, error-driven learning algorithm with inhibitory competition and Hebbian learning, show that these mechanisms can result in good generalization in interactive networks. These results support the general conclusion that cognitive neuroscience models that incorporate the core mechanistic principles of interactivity, inhibitory competition, and error-driven and Hebbian learning satisfy a wider range of biological, psychological, and computational constraints than models employing a subset of these principles.

[1]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[2]  Geoffrey E. Hinton,et al.  Learning Representations by Recirculation , 1987, NIPS.

[3]  James L. McClelland,et al.  Generalization with Componential Attractors: Word and Nonword Reading in an Attractor Network , 1993 .

[4]  David H. Wolpert,et al.  The Existence of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[5]  James L. McClelland,et al.  Conspiracy effects in word pronunciation. , 1987 .

[6]  Tomoki Fukai,et al.  A Simple Neural Network Exhibiting Selective Activation of Neuronal Ensembles: From Winner-Take-All to Winners-Share-All , 1997, Neural Computation.

[7]  R. O’Reilly,et al.  Figure-ground organization and object recognition processes: an interactive account. , 1998, Journal of experimental psychology. Human perception and performance.

[8]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[9]  K. Miller,et al.  Ocular dominance column development: analysis and simulation. , 1989, Science.

[10]  R. Zemel,et al.  Learning sparse multiple cause models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[11]  R. O’Reilly Six principles for biologically based computational models of cortical cognition , 1998, Trends in Cognitive Sciences.

[12]  Fernando J. Pineda,et al.  Generalization of Back propagation to Recurrent and Higher Order Neural Networks , 1987, NIPS.

[13]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[14]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[15]  D. Besner,et al.  Reading pseudohomophones: Implications for models of pronunciation assembly and the locus of word-frequency effects in naming. , 1987 .

[16]  O. Brousse Generativity and systematicity in neural network combinatorial learning , 1992 .

[17]  James L. McClelland,et al.  Understanding normal and impaired word reading: computational principles in quasi-regular domains. , 1996, Psychological review.

[18]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[19]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[20]  J. Lisman,et al.  A mechanism for the Hebb and the anti-Hebb processes underlying learning and memory. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Michael I. Jordan,et al.  Deterministic Boltzmann Learning Performs Steepest Descent in Weight-Space , 2001 .

[22]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[23]  Richard M. Golden,et al.  Book review: Computational explorations in cognitive neuroscience: understanding the mind by simulating the brain by O'Reilly, R. C., & Munakata, Y. , 2002 .

[24]  G. Shepherd The Synaptic Organization of the Brain , 1979 .

[25]  Mark S. Seidenberg,et al.  Phonology, reading acquisition, and dyslexia: insights from connectionist models. , 1999, Psychological review.

[26]  S. Pinker,et al.  On language and connectionism: Analysis of a parallel distributed processing model of language acquisition , 1988, Cognition.

[27]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[28]  W. Precht The synaptic organization of the brain G.M. Shepherd, Oxford University Press (1975). 364 pp., £3.80 (paperback) , 1976, Neuroscience.

[29]  Steven J. Nowlan,et al.  Maximum Likelihood Competitive Learning , 1989, NIPS.

[30]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[31]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[32]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[33]  M. Bear,et al.  Synaptic plasticity: LTP and LTD , 1994, Current Opinion in Neurobiology.

[34]  John J. Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities , 1999 .

[35]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[36]  R. Frostig,et al.  Optical imaging of neuronal activity. , 1988, Physiological reviews.

[37]  Fernando J. Pineda,et al.  GENERALIZATION OF BACKPROPAGATION TO RECURRENT AND HIGH-ORDER NETWORKS. , 1987 .

[38]  James L. McClelland,et al.  Hippocampal conjunctive encoding, storage, and recall: Avoiding a trade‐off , 1994, Hippocampus.

[39]  R. O’Reilly,et al.  Conjunctive representations in learning and memory: principles of cortical and hippocampal function. , 2001, Psychological review.

[40]  Andreas Weigend,et al.  On overfitting and the effective number of hidden units , 1993 .

[41]  J. B. Levitt,et al.  Topography of pyramidal neuron intrinsic connections in macaque monkey prefrontal cortex (areas 9 and 46) , 1993, The Journal of comparative neurology.

[42]  R. O’Reilly,et al.  Computational principles of learning in the neocortex and hippocampus , 2000, Hippocampus.

[43]  V. Marchman,et al.  Learning from a connectionist model of the acquisition of the English past tense , 1996, Cognition.

[44]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[45]  R. Glushko The Organization and Activation of Orthographic Knowledge in Reading Aloud. , 1979 .

[46]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Kenneth D. Miller,et al.  The Role of Constraints in Hebbian Learning , 1994, Neural Computation.

[48]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[49]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[50]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[51]  Eric Saund,et al.  A Multiple Cause Mixture Model for Unsupervised Learning , 1995, Neural Computation.

[52]  David C. Noelle,et al.  Methods for Learning Articulated Attractors over Internal Representations , 1999 .

[53]  Michael W. Spratling,et al.  Preintegration Lateral Inhibition Enhances Unsupervised Learning , 2002, Neural Computation.

[54]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[55]  I. Divac Cortical circuits: Synaptic organization of the cerebral cortex. Structure, function and theory by Edward L. White, Birkäuser, 1989. Sw. fr. 88.00 (xvi + 223 pages) ISBN 3 7643 3402 9 , 1990, Trends in Neurosciences.

[56]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[57]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[58]  De Vries Book review: R.C. O'Reilly and Y. Munakata: Computational explorations in cognitive neuroscience: understanding the mind by stimulating the brain. Cambridge, Mass: The MIT Press. , 2002 .

[59]  R. O’Reilly,et al.  Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain , 2000 .

[60]  Steven Shardlow,et al.  Methods of Learning , 1996 .

[61]  Richard A. Andersen,et al.  A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons , 1988, Nature.

[62]  Harry G. Barrow,et al.  The Role of Weight Normalization in Competitive Learning , 1994, Neural Computation.

[63]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[64]  J. Lisman The CaM kinase II hypothesis for the storage of synaptic memory , 1994, Trends in Neurosciences.

[65]  D Marr,et al.  Simple memory: a theory for archicortex. , 1971, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[66]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[67]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[68]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[69]  Javier R. Movellan,et al.  Contrastive Hebbian Learning in the Continuous Hopfield Model , 1991 .

[70]  Fernando J. Pineda,et al.  Dynamics and architecture for neural computation , 1988, J. Complex..

[71]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[72]  G. Marcus Rethinking Eliminative Connectionism , 1998, Cognitive Psychology.

[73]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[74]  M. Bear,et al.  A synaptic basis for memory storage in the cerebral cortex. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[75]  James L. McClelland,et al.  A distributed, developmental model of word recognition and naming. , 1989, Psychological review.

[76]  W. Singer,et al.  Different voltage-dependent thresholds for inducing long-term depression and long-term potentiation in slices of rat visual cortex , 1990, Nature.

[77]  Luís B. Almeida,et al.  A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[78]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[79]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: part 1.: an account of basic findings , 1988 .

[80]  David H. Wolpert,et al.  On the Connection between In-sample Testing and Generalization Error , 1992, Complex Syst..

[81]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .