How Many Mechanisms Are Needed to Analyze Speech? A Connectionist Simulation of Structural Rule Learning in Artificial Language Acquisition

Some empirical evidence in the artificial language acquisition literature has been taken to suggest that statistical learning mechanisms are insufficient for extracting structural information from an artificial language. According to the more than one mechanism (MOM) hypothesis, at least two mechanisms are required in order to acquire language from speech: (a) a statistical mechanism for speech segmentation; and (b) an additional rule-following mechanism in order to induce grammatical regularities. In this article, we present a set of neural network studies demonstrating that a single statistical mechanism can mimic the apparent discovery of structural regularities, beyond the segmentation of speech. We argue that our results undermine one argument for the MOM hypothesis.

[1]  R. Peereman,et al.  Do We Need Algebraic-Like Computations? A Reply to Bonatti, Pena, Nespor, and Mehler (2006). , 2006 .

[2]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[3]  E. Rolls,et al.  Neural networks and brain function , 1998 .

[4]  R. Henson Short-Term Memory for Serial Order: The Start-End Model , 1998, Cognitive Psychology.

[5]  Mark S. Seidenberg Do Infants Learn Grammar with Algebra or Statistics? , 1999, Science.

[6]  Stephan K. Chalup,et al.  Incremental training of first order recurrent neural networks to predict a context-sensitive language , 2003, Neural Networks.

[7]  Sébastien Pacton,et al.  Learning Nonadjacent Dependencies , 2012 .

[8]  R. O’Reilly Six principles for biologically based computational models of cortical cognition , 1998, Trends in Cognitive Sciences.

[9]  Joseph H. Greenberg,et al.  Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[10]  Jacques Mehler,et al.  Primitive computations in speech processing , 2009, Quarterly journal of experimental psychology.

[11]  Barbara Tillmann,et al.  Exploiting Multiple Sources of Information in Learning an Artificial Language: Human Data and Modeling , 2009 .

[12]  Mark S. Seidenberg,et al.  Language Acquisition and Use: Learning and Applying Probabilistic Constraints , 1997, Science.

[13]  E. Newport,et al.  WORD SEGMENTATION : THE ROLE OF DISTRIBUTIONAL CUES , 1996 .

[14]  Nick Chater,et al.  Connectionist natural language processing: the state of the art , 1999, Cogn. Sci..

[15]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[16]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[17]  G. Altmann Learning and development in neural networks – the importance of prior experience , 2002, Cognition.

[18]  A. Vinter,et al.  PARSER: A Model for Word Segmentation , 1998 .

[19]  J. Elman,et al.  Networks are not ‘hidden rules’ , 1999, Trends in Cognitive Sciences.

[20]  Michael Ramscar The role of meaning in inflection: Why the past tense does not require a rule , 2002, Cognitive Psychology.

[21]  Jessica F. Hay,et al.  Learning in reverse: Eight-month-old infants track backward transitional probabilities , 2009, Cognition.

[22]  Pierre Perruchet,et al.  A role for backward transitional probabilities in word segmentation? , 2008, Memory & cognition.

[23]  Morten H. Christiansen,et al.  The differential role of phonological and distributional cues in grammatical categorisation , 2005, Cognition.

[24]  G. Marcus The Algebraic Mind: Integrating Connectionism and Cognitive Science , 2001 .

[25]  M Negishi,et al.  Do infants learn grammar with algebra or statistics? , 1999, Science.

[26]  R. O’Reilly,et al.  Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain , 2000 .

[27]  Robert M. Gonyea,et al.  Learning at a Distance : , 2009 .

[28]  A. Tversky Features of Similarity , 1977 .

[29]  Marina Nespor,et al.  Signal-Driven Computations in Speech Processing , 2002, Science.

[30]  Janet Wiles,et al.  Context-free and context-sensitive dynamics in recurrent neural networks , 2000, Connect. Sci..

[31]  S. Pinker,et al.  The past and future of the past tense , 2002, Trends in Cognitive Sciences.

[32]  E. Newport,et al.  Learning at a distance I. Statistical learning of non-adjacent dependencies , 2004, Cognitive Psychology.

[33]  Patrick Juola,et al.  A connectionist model of english past tense and plural morphology , 1999, Cogn. Sci..

[34]  Elissa L. Newport,et al.  Innately Constrained Learning: Blending Old and New Approaches to Language Acquisition , 2001 .

[35]  James L. McClelland,et al.  Parallel Distributed Processing: Explorations in the Microstructure of Cognition : Psychological and Biological Models , 1986 .

[36]  Paco Calvo,et al.  A Connectionist Simulation of Structural Rule Learning in Language Acquisition , 2008 .

[37]  R. Peereman,et al.  Learning Nonadjacent Dependencies: No Need for Algebraic-like Computations Is It Possible to Learn the Relation between 2 Nonadjacent Events? , 2004 .

[38]  Marina Nespor,et al.  How to hit Scylla without avoiding Charybdis: comment on Perruchet, Tyler, Galland, and Peereman (2004). , 2006, Journal of experimental psychology. General.

[39]  A. Endress,et al.  Rapid learning of syllable classes from a perceptually continuous speech stream , 2007, Cognition.

[40]  Nick Chater,et al.  Distributional Information: A Powerful Cue for Acquiring Syntactic Categories , 1998, Cogn. Sci..

[41]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[42]  Peter M. Vishton,et al.  Rule learning by seven-month-old infants. , 1999, Science.

[43]  R. Horton Rules and representations , 1993, The Lancet.

[44]  Paul Rodríguez,et al.  Simple Recurrent Networks Learn Context-Free and Context-Sensitive Languages by Counting , 2001, Neural Computation.

[45]  G. Miller,et al.  Cognitive science. , 1981, Science.

[46]  L. Gerken,et al.  Infants can use distributional cues to form syntactic categories , 2005, Journal of Child Language.

[47]  Morten H. Christiansen,et al.  Transfer of learning: rule acquisition or statistical learning? , 1999, Trends in Cognitive Sciences.

[48]  Nick Chater,et al.  Phonology impacts segmentation in online speech processing , 2005 .

[49]  Jeffrey L. Elman,et al.  Default Generalisation in Connectionist Networks. , 1995 .

[50]  R. Gómez,et al.  The Developmental Trajectory of Nonadjacent Dependency Learning. , 2005, Infancy : the official journal of the International Society on Infant Studies.

[51]  Jenny R. Saffran,et al.  Does Grammar Start Where Statistics Stop? , 2002, Science.

[52]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[53]  Marian Klamer Explaining some structural and semantic asymmetries in morphological typology , 2003 .