Spoken word recognition without a TRACE

How do we map the rapid input of spoken language onto phonological and lexical representations over time? Attempts at psychologically-tractable computational models of spoken word recognition tend either to ignore time or to transform the temporal input into a spatial representation. TRACE, a connectionist model with broad and deep coverage of speech perception and spoken word recognition phenomena, takes the latter approach, using exclusively time-specific units at every level of representation. TRACE reduplicates featural, phonemic, and lexical inputs at every time step in a large memory trace, with rich interconnections (excitatory forward and backward connections between levels and inhibitory links within levels). As the length of the memory trace is increased, or as the phoneme and lexical inventory of the model is increased to a realistic size, this reduplication of time- (temporal position) specific units leads to a dramatic proliferation of units and connections, begging the question of whether a more efficient approach is possible. Our starting point is the observation that models of visual object recognition—including visual word recognition—have grappled with the problem of spatial invariance, and arrived at solutions other than a fully-reduplicative strategy like that of TRACE. This inspires a new model of spoken word recognition that combines time-specific phoneme representations similar to those in TRACE with higher-level representations based on string kernels: temporally independent (time invariant) diphone and lexical units. This reduces the number of necessary units and connections by several orders of magnitude relative to TRACE. Critically, we compare the new model to TRACE on a set of key phenomena, demonstrating that the new model inherits much of the behavior of TRACE and that the drastic computational savings do not come at the cost of explanatory power.

[1]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[2]  Ted J. Strauss,et al.  jTRACE: A reimplementation and extension of the TRACE model of speech perception and spoken word recognition , 2007, Behavior research methods.

[3]  S. Grossberg,et al.  The resonant dynamics of speech perception: interword integration and duration-dependent backward effects. , 2000, Psychological review.

[4]  W. Marslen-Wilson,et al.  The temporal structure of spoken language understanding , 1980, Cognition.

[5]  Dennis Norris,et al.  A dynamic-net model of human speech recognition , 1991 .

[6]  John C. Trueswell,et al.  The Convergence of Lexicalist Perspectives in Psycholinguistics and Computational Linguistics , 2000 .

[7]  M. Tanenhaus,et al.  Time Course of Frequency Effects in Spoken-Word Recognition: Evidence from Eye Movements , 2001, Cognitive Psychology.

[8]  James L. McClelland,et al.  Understanding normal and impaired word reading: computational principles in quasi-regular domains. , 1996, Psychological review.

[9]  Jonathan Grainger,et al.  Computational models of location-invariant orthographic processing , 2013, Connect. Sci..

[10]  Lori L. Holt,et al.  Response to McQueen et al.: Theoretical and empirical arguments support interactive processing , 2006, Trends in Cognitive Sciences.

[11]  Lori L. Holt,et al.  Are there interactive processes in speech perception? , 2006, Trends in Cognitive Sciences.

[12]  Stanislas Dehaene,et al.  Distinct unimodal and multimodal regions for word processing in the left temporal cortex , 2004, NeuroImage.

[13]  Ulrich Hans Frauenfelder,et al.  Simulating the time course of spoken word recognition : an analysis of lexical competition in TRACE , 1998 .

[14]  Brian A. Wandell,et al.  Position sensitivity in the visual word form area , 2012, Proceedings of the National Academy of Sciences.

[15]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[16]  Hod Lipson,et al.  The evolutionary origins of modularity , 2012, Proceedings of the Royal Society B: Biological Sciences.

[17]  Emily B. Myers,et al.  Speaker invariance for phonetic information: An fMRI investigation , 2012, Language and cognitive processes.

[18]  Kevin Diependaele,et al.  Masked repetition and phonological priming within and across modalities. , 2003, Journal of experimental psychology. Learning, memory, and cognition.

[19]  Colin J Davis,et al.  The spatial coding model of visual word identification. , 2010, Psychological review.

[20]  Jonathan Grainger,et al.  The Lazy Visual Word Form Area: Computational Insights into Location-Sensitivity , 2013, PLoS Comput. Biol..

[21]  Jonathan Grainger,et al.  Watching the Word Go by: On the Time-course of Component Processes in Visual Word Recognition , 2009, Lang. Linguistics Compass.

[22]  S Lehéricy,et al.  The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients. , 2000, Brain : a journal of neurology.

[23]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[24]  Roger M. Cooper,et al.  The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. , 1974 .

[25]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: part 1.: an account of basic findings , 1988 .

[26]  Stephen Grossberg,et al.  Laminar cortical dynamics of conscious speech perception: neural model of phonemic restoration using subsequent context in noise. , 2011, The Journal of the Acoustical Society of America.

[27]  Jonathan Grainger,et al.  Cross-code consistency effects in visual word recognition. , 2008 .

[28]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[29]  William D. Marslen-Wilson,et al.  Integrating Form and Meaning: A Distributed Model of Speech Perception. , 1997 .

[30]  S. Dehaene,et al.  Direct Intracranial, fMRI, and Lesion Evidence for the Causal Role of Left Inferotemporal Cortex in Reading , 2006, Neuron.

[31]  John Shawe-Taylor,et al.  Symmetries and discriminability in feedforward network architectures , 1993, IEEE Trans. Neural Networks.

[32]  Anne Cutler,et al.  Are there really interactive processes in speech perception? , 2006, Trends in Cognitive Sciences.

[33]  James L. McClelland,et al.  A distributed, developmental model of word recognition and naming. , 1989, Psychological review.

[34]  K. Rayner,et al.  Eye movements during reading: some current controversies , 2001, Trends in Cognitive Sciences.

[35]  Marco Zorzi,et al.  Beyond single syllables: Large-scale modeling of reading aloud with the Connectionist Dual Process (CDP++) model , 2010, Cognitive Psychology.

[36]  James L. McClelland Stochastic interactive processes and the effect of context on perception , 1991, Cognitive Psychology.

[37]  M Coltheart,et al.  DRC: a dual route cascaded model of visual word recognition and reading aloud. , 2001, Psychological review.

[38]  D. Massaro Testing between the TRACE model and the fuzzy logical model of speech perception , 1989, Cognitive Psychology.

[39]  D Norris,et al.  Merging information in speech recognition: Feedback is never necessary , 2000, Behavioral and Brain Sciences.

[40]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[41]  Bharath Chandrasekaran,et al.  Neural Processing of What and Who Information during Spoken Language Processing , 2011 .

[42]  Marco Zorzi,et al.  Nested incremental modeling in the development of computational theories: the CDP+ model of reading aloud. , 2007, Psychological review.

[43]  M de Kamps,et al.  From artificial neural networks to spiking neuron populations and back again. , 2001, Neural networks : the official journal of the International Neural Network Society.

[44]  Michael N Jones,et al.  Representing word meaning and order information in a composite holographic lexicon. , 2007, Psychological review.

[45]  W. Tabor,et al.  Evidence for self-organized sentence processing: digging-in effects. , 2004, Journal of experimental psychology. Learning, memory, and cognition.

[46]  Jonathan Grainger,et al.  Learning location-invariant orthographic representations for printed words , 2010, Connect. Sci..

[47]  Kevin Diependaele,et al.  Fast phonology and the Bimodal Interactive Activation Model , 2010 .

[48]  M. Tanenhaus,et al.  Subcategorical mismatches and the time course of lexical access: Evidence for lexical competition , 2001 .

[49]  James S. Magnuson,et al.  Simple Recurrent Networks and Competition Effects in Spoken Word Recognition , 2000 .

[50]  O. Sporns,et al.  The economy of brain network organization , 2012, Nature Reviews Neuroscience.

[51]  Harlan D. Harris,et al.  Computational Models of Spoken Word Recognition , 2012 .

[52]  Markus F. Damian,et al.  A fundamental limitation of the conjunctive codes learned in PDP models of cognition: comment on Botvinick and Plaut (2006). , 2009, Psychological review.

[53]  James L. McClelland,et al.  Computational and behavioral investigations of lexically induced delays in phoneme recognition , 2005 .

[54]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[55]  Clinton L. Johns,et al.  Phonological instability in young adult poor readers: Time course measures and computational modeling , 2011 .

[56]  D. Norris,et al.  Shortlist B: a Bayesian model of continuous speech recognition. , 2008, Psychological review.

[57]  J Grainger,et al.  Orthographic processing in visual word recognition: a multiple read-out model. , 1996, Psychological review.

[58]  Chris J. S. Webber Self-Organization of Symmetry Networks: Transformation Invariance from the Spontaneous Symmetry-Breaking Mechanism , 2000, Neural Computation.

[59]  Jonathan Grainger,et al.  Broken Symmetries in a Location-Invariant Word Recognition Network , 2011, Neural Computation.

[60]  Jonathan Grainger,et al.  Protein Analysis Meets Visual Word Recognition: A Case for String Kernels in the Brain , 2012, Cogn. Sci..