Spoken Word Recognition: Historical Roots, Current Theoretical Issues, and Some New Directions

The most distinctive hallmark of human spoken word recognition (SWR) is its perceptual robustness to the presence of acoustic variability in the transmission and reception of the talker’s linguistic message. Normal-hearing listeners adapt rapidly with little apparent effort to many different sources of variability in the speech signal and their immediate listening environment. Sensory processing and early encoding of speech into lexical representations are critical for robust SWR. However, audibility and sensory processing are not sufficient to account for the robust nature of SWR, especially under degraded listening conditions. In this chapter, we describe the historical roots of the field, present a selective review of the principle theoretical issues, and then consider several contemporary models of SWR. We conclude by identifying promising new directions and future challenges, including the perception of foreign accented speech, SWR by deaf children with cochlear implants, bilinguals, and older adults.

[1]  T. Lunner,et al.  The Ease of Language Understanding (ELU) model: theoretical, empirical, and clinical advances , 2013, Front. Syst. Neurosci..

[2]  Michael S. Vitevitch,et al.  Insights into failed lexical retrieval from network science , 2014, Cognitive Psychology.

[3]  Cathy J. Price,et al.  A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading , 2012, NeuroImage.

[4]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[5]  Homer Dudley Speechߞman's natural communication , 1967, IEEE Spectrum.

[6]  Broadbent De Word-frequency effect and response bias. , 1967 .

[7]  M. Sommers,et al.  The effects of talker familiarity on spoken word identification in younger and older listeners. , 2000, Psychology and aging.

[8]  Robert F. Port,et al.  Language as a Social Institution: Why Phonemes and Words Do Not Live in the Brain , 2010 .

[9]  J. Jenkins,et al.  Studies in the Psychological Correlates of the Sound System of American English , 1964 .

[10]  L. Postman,et al.  Intelligibility as a function of frequency of usage. , 1956, Journal of experimental psychology.

[11]  M. Gareth Gaskell Statistical and connectionist models of speech perception and word recognition , 2007 .

[12]  Harlan D. Harris,et al.  Computational Models of Spoken Word Recognition , 2012 .

[13]  Robert F. Port,et al.  Rich memory and distributed phonology , 2010 .

[14]  Rachel M. Theodore,et al.  Attention modulates the time-course of talker-specificity effects in lexical retrieval , 2011 .

[15]  J. Kruschke,et al.  Rules and exemplars in category learning. , 1998, Journal of experimental psychology. General.

[16]  G. A. Miller,et al.  Some perceptual consequences of linguistic rules , 1963 .

[17]  Fergus I. M. Craik,et al.  Four Points to Remember: A Tetrahedral Model of Memory Experiments , 2014 .

[18]  G. A. Miller,et al.  The intelligibility of speech as a function of the context of the test materials. , 1951, Journal of experimental psychology.

[19]  P. Luce,et al.  Probabilistic Phonotactics and Neighborhood Activation in Spoken Word Recognition , 1999 .

[20]  D. Pisoni,et al.  Speech perception without traditional speech cues. , 1981, Science.

[21]  P. Luce,et al.  Spoken Word Recognition: The Challenge of Variation , 2005 .

[22]  C. Mclennan,et al.  Examining the Effects of Variation in Emotional Tone of Voice on Spoken Word Recognition , 2013, Quarterly journal of experimental psychology.

[23]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[24]  S. Della Sala,et al.  Cognitive Advantage in Bilingualism , 2015, Psychological science.

[25]  P. Jusczyk,et al.  Speech Perception and Spoken Word Recognition: Past and Present , 2002, Ear and hearing.

[26]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[27]  Brian W. Eukel Phonotactic basis for word frequency effects: Implications for lexical distance metrics , 1980 .

[28]  Keith Johnson,et al.  Resonance in an exemplar-based lexicon: The emergence of social identity and phonology , 2006, J. Phonetics.

[29]  S. Grossberg,et al.  The resonant dynamics of speech perception: interword integration and duration-dependent backward effects. , 2000, Psychological review.

[30]  C. Mclennan,et al.  Famous talker effects in spoken word recognition , 2014, Attention, perception & psychophysics.

[31]  Thomas T. Hills,et al.  Small Worlds and Semantic Network Growth in Typical and Late Talkers , 2011, PloS one.

[32]  E. D. Burgund,et al.  Viewpoint-invariant and viewpoint-dependent object recognition in dissociable neural subsystems , 2000, Psychonomic bulletin & review.

[33]  Gerry Altmann,et al.  Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives - Workshop Overview , 1989, AI Mag..

[34]  Joanne L. Miller,et al.  Speech Perception , 1990, Springer Handbook of Auditory Research.

[35]  C. Mclennan,et al.  Hemispheric Differences in the Recognition of Environmental Sounds , 2008, Psychological science.

[36]  Irwin Pollack,et al.  Analysis of Incorrect Responses to an Unknown Message Set , 1960 .

[37]  T. Lunner,et al.  The emergence of cognitive hearing science. , 2009, Scandinavian journal of psychology.

[38]  I. Meister,et al.  Cognitive resources related to speech recognition with a competing talker in young and older listeners , 2013, Neuroscience.

[39]  Michael Garman,et al.  Psycholinguistics: Accessing the mental lexicon , 1990 .

[40]  Michel Treisman,et al.  Space or lexicon? The word frequency effect and the error response frequency effect , 1978 .

[41]  R. C. Oldfield Things, Words and the Brain* , 1966, The Quarterly journal of experimental psychology.

[42]  S. S. Stevens,et al.  The development of recorded auditory tests for measuring hearing loss for speech , 1947, The Laryngoscope.

[43]  Roger K. Moore Spoken language processing: Piecing together the puzzle , 2007, Speech Commun..

[44]  L. Brooks,et al.  Nonanalytic Cognition: Memory, Perception, and Concept Learning , 1984 .

[45]  G. A. Miller,et al.  The role of semantic and syntactic constraints in the memorization of English sentences , 1964 .

[46]  Sven L Mattys,et al.  On building models of spoken-word recognition: When there is as much to learn from natural “oddities” as artificial normality , 2008, Perception & psychophysics.

[47]  Julio González,et al.  Examining talker effects in the perception of native- and foreign-accented speech , 2012, Attention, perception & psychophysics.

[48]  M. Treisman A Theory of the Identification of Complex Stimuli with an Application to Word Recognition. , 1978 .

[49]  Roger K. Moore Towards a unified theory of spoken language processing , 2005, Fourth IEEE Conference on Cognitive Informatics, 2005. (ICCI 2005)..

[50]  Wayne A. Wickelgran Context-sensitive coding, associative memory, and serial order in (speech) behavior. , 1969 .

[51]  B. McMurray,et al.  What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations. , 2011, Psychological review.

[52]  John J. L. Morton,et al.  Interaction of information in word recognition. , 1969 .

[53]  Joseph H. Greenberg,et al.  Current trends in linguistics. , 1959, Science.

[54]  Boaz M Ben-David,et al.  Effects of aging and noise on real-time spoken word recognition: evidence from eye movements. , 2011, Journal of speech, language, and hearing research : JSLHR.

[55]  H. Fletcher,et al.  The Perception of Speech and Its Relation to Telephony , 1950 .

[56]  Michael J. Spivey,et al.  Continuous attraction toward phonological competitors. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[57]  J. Kroll,et al.  Understanding the consequences of bilingualism for language processing and cognition , 2013, Journal of cognitive psychology.

[58]  Douglas L. Hintzman,et al.  Judgments of frequency and recognition memory in a multiple-trace memory model. , 1988 .

[59]  Michael S. Vitevitch,et al.  It's good . . . But is it ART? , 2000 .

[60]  Sarah C. Creel,et al.  Gradient language dominance affects talker learning , 2014, Cognition.

[61]  D B Pisoni,et al.  Cognitive Factors and Cochlear Implants: Some Thoughts on Perception, Learning, and Memory in Speech Perception , 2000, Ear and hearing.

[62]  Dennis H. Klatt,et al.  Speech perception: a model of acoustic–phonetic analysis and lexical access , 1979 .

[63]  Vanessa Taler,et al.  Lexical neighborhood density effects on spoken word recognition and production in healthy aging. , 2010, The journals of gerontology. Series B, Psychological sciences and social sciences.

[64]  G. A. Miller,et al.  Finitary models of language users , 1963 .

[65]  W D Marslen-Wilson,et al.  Sentence Perception as an Interactive Parallel Process , 1975, Science.

[66]  D Norris,et al.  Merging information in speech recognition: Feedback is never necessary , 2000, Behavioral and Brain Sciences.

[67]  S. Scott,et al.  Identification of a pathway for intelligible speech in the left temporal lobe. , 2000, Brain : a journal of neurology.

[68]  T. Landauer,et al.  Structural differences between common and rare words: Failure of equivalence assumptions for theories of word recognition , 1973 .

[69]  David B Pisoni,et al.  Clustering coefficients of lexical neighborhoods: Does neighborhood structure matter in spoken word recognition? , 2010, The mental lexicon.

[70]  Dennis H. Klatt,et al.  Review of the ARPA speech understanding project , 1990 .

[71]  S. Scott,et al.  Retrieving meaning after temporal lobe infarction: The role of the basal language area , 2004, Annals of neurology.

[72]  D. Pisoni,et al.  Executive functioning skills in long-term users of cochlear implants: a case control study. , 2013, Journal of pediatric psychology.

[73]  David B. Pisoni,et al.  Speech perception, word recognition and the structure of the lexicon , 1985, Speech Commun..

[74]  R. McArdle,et al.  Speech signals used to evaluate functional status of the auditory system. , 2005, Journal of rehabilitation research and development.

[75]  James S. Magnuson,et al.  Spoken Word Recognition , 2013 .

[76]  Michael S Vitevitch,et al.  The influence of neighborhood density on the recognition of Spanish-accented words. , 2015, Journal of experimental psychology. Human perception and performance.

[77]  G. A. Miller,et al.  Verbal context and the recall of meaningful material. , 1950, The American journal of psychology.

[78]  D. Klatt Review of selected models of speech perception , 1989 .

[79]  James M. McQueen,et al.  Eight questions about spoken-word recognition , 2007 .

[80]  Yoed N. Kenett,et al.  Semantic organization in children with cochlear implants: computational analysis of verbal fluency , 2013, Front. Psychol..

[81]  P. Luce,et al.  Examining the time course of indexical specificity effects in spoken word recognition. , 2005, Journal of experimental psychology. Learning, memory, and cognition.

[82]  W. Marslen-Wilson,et al.  Representation and competition in the perception of spoken words , 2002, Cognitive Psychology.

[83]  Michael S. Vitevitch,et al.  Processing of indexical information requires time: Evidence from change deafness , 2011, Quarterly journal of experimental psychology.

[84]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[85]  H. Rubenstein,et al.  Intelligibility of Known and Unknown Message Sets , 1959 .

[86]  Julio González,et al.  Hemispheric differences in indexical specificity effects in spoken word recognition. , 2007, Journal of experimental psychology. Human perception and performance.

[87]  G. A. Miller The Perception of Speech. , 1951 .

[88]  H. Savin Word‐Frequency Effect and Errors in the Perception of Speech , 1963 .

[89]  Bruce Lowerre,et al.  The Harpy speech understanding system , 1990 .

[90]  Paul D. Allopenna,et al.  Tracking the Time Course of Spoken Word Recognition Using Eye Movements: Evidence for Continuous Mapping Models , 1998 .

[91]  Eugene Galanter,et al.  Handbook of mathematical psychology: I. , 1963 .

[92]  R. Zatorre,et al.  Voice-selective areas in human auditory cortex , 2000, Nature.

[93]  Kenneth I. Forster,et al.  Basic issues in lexical processing , 1989 .

[94]  M. Rosenzweig,et al.  Wartime Research in Psycho-Acoustics , 1948 .

[95]  Julio González,et al.  Hemispheric differences in specificity effects in talker identification , 2010, Attention, perception & psychophysics.

[96]  Robert E Remez,et al.  On the perception of similarity among talkers. , 2007, The Journal of the Acoustical Society of America.

[97]  Nilo A Lindgren,et al.  Machine recognition of human language Part I - Automatic speech recognition , 1965, IEEE Spectrum.

[98]  Lou Boves,et al.  Computational modelling of spoken-word recognition processes: Design choices and evaluation , 2010 .

[99]  D. Pisoni,et al.  Representations and representational specificity in speech perception and spoken word recognition , 2007 .

[100]  Olaf Sporns,et al.  Connectivity and complexity: the relationship between neuroanatomy and brain dynamics , 2000, Neural Networks.

[101]  Chad J. Marsolek,et al.  Abstractionist versus Exemplar-Based Theories of Visual Word Priming: A Subsystems Resolution , 2004, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[102]  D. Lancker,et al.  Voice discrimination and recognition are separate abilities , 1987, Neuropsychologia.

[103]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[104]  M. Vitevitch What do foreign neighbors say about the mental lexicon?* , 2011, Bilingualism: Language and Cognition.

[105]  Douglas L. Hintzman,et al.  "Schema Abstraction" in a Multiple-Trace Memory Model , 1986 .

[106]  George A. Miller,et al.  Introduction to the Formal Analysis of Natural Languages , 1968 .

[107]  Roger K. Moore PRESENCE: A Human-Inspired Architecture for Speech-Based Human-Machine Interaction , 2007, IEEE Transactions on Computers.

[108]  D. Howes On the interpretation of word frequency as a variable affecting speed of recognition. , 1954, Journal of experimental psychology.

[109]  K. Stevens,et al.  An Electrical Analog of the Vocal Tract , 1953 .

[110]  H. Marko Information theory and cybernetics , 1967 .

[111]  D. Norris,et al.  Shortlist B: a Bayesian model of continuous speech recognition. , 2008, Psychological review.

[112]  G. E. Peterson,et al.  The Information‐Bearing Elements of Speech , 1952 .

[113]  Alexander I. Rudnicky,et al.  What's new in speech perception? The research and ideas of William Chandler Bagley, 1874-1946. , 1983, Psychological review.

[114]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[115]  Stephen Grossberg,et al.  Resonant neural dynamics of speech perception , 2003, J. Phonetics.

[116]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[117]  D. Pisoni SPEECH PERCEPTION IN DEAF CHILDREN WITH COCHLEAR IMPLANTS , 2008 .

[118]  R. Cole Listening for mispronunciations: A measure of what we hear during speech , 1973 .

[119]  D. Pisoni,et al.  Recognizing Spoken Words: The Neighborhood Activation Model , 1998, Ear and hearing.

[120]  Nilo A Lindgren,et al.  Machine recognition of human language Part II - Theoretical models of speech perception and language , 1965, IEEE Spectrum.

[121]  D. Howes On the Relation between the Intelligibility and Frequency of Occurrence of English Words , 1957 .

[122]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[123]  George A. Miller,et al.  Decision units in the perception of speech , 1962, IRE Trans. Inf. Theory.

[124]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[125]  J. V. van Berkum,et al.  How robust is the language architecture? The case of mood , 2013, Front. Psychol..

[126]  Jonathan Grainger,et al.  Spoken word recognition without a TRACE , 2013, Front. Psychol..

[127]  C. Mclennan The Time Course of Variability Effects in the Perception of Spoken Language: Changes Across the Lifespan , 2006, Language and speech.

[128]  R. Shiffrin,et al.  A model for recognition memory: REM—retrieving effectively from memory , 1997, Psychonomic bulletin & review.

[129]  Eugene Brandewie,et al.  Prior listening in rooms improves speech intelligibility. , 2010, The Journal of the Acoustical Society of America.

[130]  M. Daneman,et al.  How young and old adults listen to and remember speech in noise. , 1995, The Journal of the Acoustical Society of America.

[131]  Blair C. Armstrong,et al.  The what, when, where, and how of visual word recognition , 2014, Trends in Cognitive Sciences.

[132]  Janet B. Pierrehumbert,et al.  Exemplar dynamics: Word frequency, lenition and contrast , 2000 .

[133]  J. C. R. Licklider,et al.  On the Process of Speech Perception , 1952 .

[134]  David B. Pisoni,et al.  CHAPTER 1 – Speech Perception: Research, Theory, and the Principal Issues* , 1986 .

[135]  A. Liberman,et al.  Some Experiments on the Perception of Synthetic Speech Sounds , 1952 .

[136]  William D Marslen-Wilson,et al.  Processing interactions and lexical access during word recognition in continuous speech , 1978, Cognitive Psychology.

[137]  Thomas K. Landauer,et al.  Information retrieval from long-term memory: Category size and recognition time , 1968 .

[138]  S. Scott,et al.  The neuroanatomical and functional organization of speech perception , 2003, Trends in Neurosciences.

[139]  M. Vitevitch What can graph theory tell us about word learning and lexical retrieval? , 2008, Journal of speech, language, and hearing research : JSLHR.

[140]  R. Schreuder,et al.  The Recognition of Reduced Word Forms , 2002, Brain and Language.

[141]  Olaf Sporns,et al.  Networks analysis, complexity, and brain function , 2002 .

[142]  P. Luce,et al.  Specificity of memory representations for spoken words , 1998, Memory & cognition.

[143]  Matthew H. Davis,et al.  Hierarchical Processing in Spoken Language Comprehension , 2003, The Journal of Neuroscience.