Shortlist B: a Bayesian model of continuous speech recognition.

A Bayesian model of continuous speech recognition is presented. It is based on Shortlist (D. Norris, 1994; D. Norris, J. M. McQueen, A. Cutler, & S. Butterfield, 1997) and shares many of its key assumptions: parallel competitive evaluation of multiple lexical hypotheses, phonologically abstract prelexical and lexical representations, a feedforward architecture with no online feedback, and a lexical segmentation algorithm based on the viability of chunks of the input as possible words. Shortlist B is radically different from its predecessor in two respects. First, whereas Shortlist was a connectionist model based on interactive-activation principles, Shortlist B is based on Bayesian principles. Second, the input to Shortlist B is no longer a sequence of discrete phonemes; it is a sequence of multiple phoneme probabilities over 3 time slices per segment, derived from the performance of listeners in a large-scale gating study. Simulations are presented showing that the model can account for key findings: data on the segmentation of continuous speech, word frequency effects, the effects of mispronunciations on word recognition, and evidence on lexical involvement in phonemic decision making. The success of Shortlist B suggests that listeners make optimal Bayesian decisions during spoken-word recognition.

[1]  A. Samuel,et al.  Perceptual learning for speech , 2009, Attention, perception & psychophysics.

[2]  G. Altmann,et al.  The Oxford Handbook of Psycholinguistics , 2007 .

[3]  James M. McQueen,et al.  Eight questions about spoken-word recognition , 2007 .

[4]  Kevin N. Gurney,et al.  The Basal Ganglia and Cortex Implement Optimal Decision Making Between Alternative Actions , 2007, Neural Computation.

[5]  R. Jacobs,et al.  WITHIN CATEGORY PHONETIC VARIABILITY AFFECTS PERCEPTUAL UNCERTAINTY , 2007 .

[6]  Naomi H. Feldman,et al.  A Rational Account of the Perceptual Magnet Effect , 2007 .

[7]  Anne Cutler,et al.  Are there really interactive processes in speech perception? , 2006, Trends in Cognitive Sciences.

[8]  James L. McClelland,et al.  An interactive Hebbian account of lexically guided tuning of speech perception , 2006, Psychonomic bulletin & review.

[9]  Anne Cutler,et al.  Phonological Abstraction in the Mental Lexicon , 2006, Cogn. Sci..

[10]  Jonathan D. Cohen,et al.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. , 2006, Psychological review.

[11]  Anne Cutler,et al.  Phonological and conceptual activation in speech comprehension , 2006, Cognitive Psychology.

[12]  Lori L. Holt,et al.  Are there interactive processes in speech perception? , 2006, Trends in Cognitive Sciences.

[13]  J. Tenenbaum,et al.  Theory-based Bayesian models of inductive learning and reasoning , 2006, Trends in Cognitive Sciences.

[14]  Dennis Norris,et al.  The Bayesian reader: explaining word recognition as an optimal Bayesian decision process. , 2006, Psychological review.

[15]  J. McQueen,et al.  The effect of voice onset time differences on lexical access in Dutch. , 2006, Journal of experimental psychology. Human perception and performance.

[16]  Keren B. Shatzman,et al.  Segment duration as a cue to word boundaries in spoken-word recognition , 2006, Perception & psychophysics.

[17]  Jay I. Myung,et al.  Global model analysis by parameter space partitioning. , 2019, Psychological review.

[18]  Louis ten Bosch,et al.  How Should a Speech Recognizer Work? , 2005, Cogn. Sci..

[19]  Laurence White,et al.  Integration of multiple speech segmentation cues: a hierarchical framework. , 2005, Journal of experimental psychology. General.

[20]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[21]  Anne Cutler,et al.  Phonological and statistical effects on timing of speech perception: Insights from a database of Dutch diphone perception , 2005, Speech Commun..

[22]  James L. McClelland,et al.  Computational and behavioral investigations of lexically induced delays in phoneme recognition , 2005 .

[23]  Anne Cutler,et al.  Twenty-first century psycholinguistics : four cornerstones , 2005 .

[24]  Johanna D. Moore,et al.  Proceedings of the 28th Annual Conference of the Cognitive Science Society , 2005 .

[25]  Anne Cutler,et al.  Chapitre 3. La perception de la parole en espagnol : un cas particulier ? , 2004, Psycholinguistique Cognitive.

[26]  W. Marslen-Wilson Accessing Spoken Words : The Importance of Word Onsets , 2004 .

[27]  Rajesh P. N. Rao Bayesian Computation in Recurrent Neural Circuits , 2004, Neural Computation.

[28]  Anne Pier Salverda,et al.  The role of prosodic boundaries in the resolution of lexical embedding in speech comprehension , 2003, Cognition.

[29]  Erik D. Reichle,et al.  The E-Z Reader model of eye-movement control in reading: Comparisons to other models , 2003, Behavioral and Brain Sciences.

[30]  Denis G. Pelli,et al.  The remarkable inefficiency of word recognition , 2003, Nature.

[31]  José Benkí,et al.  Analysis of English Nonsense Syllable Recognition in Noise , 2003, Phonetica.

[32]  Elizabeth K. Johnson,et al.  Lexical viability constraints on speech segmentation by infants , 2003, Cognitive Psychology.

[33]  Anne Cutler,et al.  Unfolding of phonetic information over time: a database of Dutch diphone perception. , 2003, The Journal of the Acoustical Society of America.

[34]  M. Tanenhaus,et al.  Gradient effects of within-category phonetic variation on lexical access , 2002, Cognition.

[35]  W. Marslen-Wilson,et al.  Representation and competition in the perception of spoken words , 2002, Cognitive Psychology.

[36]  Anne Cutler,et al.  Universality Versus Language-Specificity in Listening to Running Speech , 2002, Psychological science.

[37]  Matthew H. Davis,et al.  Leading Up the Lexical Garden Path: Segmentation and Ambiguity in Spoken Word Recognition , 2002 .

[38]  Carlos Gussenhoven,et al.  Laboratory Phonology 7 , 2002 .

[39]  Jessica Maye,et al.  Infant sensitivity to distributional information can affect phonetic discrimination , 2002, Cognition.

[40]  Anne Cutler,et al.  Language-universal constraints on speech segmentation , 2001 .

[41]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[42]  M. Tanenhaus,et al.  Time Course of Frequency Effects in Spoken-Word Recognition: Evidence from Eye Movements , 2001, Cognitive Psychology.

[43]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[44]  R. Newman,et al.  The perceptual consequences of within-talker variability in fricative production. , 2001, The Journal of the Acoustical Society of America.

[45]  Janet B. Pierrehumbert,et al.  Word-specific phonetics , 2001 .

[46]  Anne Cutler,et al.  Feedback on feedback on feedback: It's feedforward , 2000, Behavioral and Brain Sciences.

[47]  D Norris,et al.  Merging information in speech recognition: Feedback is never necessary , 2000, Behavioral and Brain Sciences.

[48]  Dominic W. Massaro The horse race to language understanding: FLMP was first out of the gate, and has yet to be overtaken , 2000 .

[49]  G. Oden Implausibility versus misinterpretation of the FLMP , 2000, Behavioral and Brain Sciences.

[50]  P. Tabossi,et al.  Syllables in the processing of spoken Italian. , 2000, Journal of experimental psychology. Human perception and performance.

[51]  S. Goldinger,et al.  Phonetic priming, neighborhood activation, and PARSYN , 2000, Perception & psychophysics.

[52]  No Value Proceedings of the 14th international congress of phonetic sciences , 2000 .

[53]  Jack L. Vevea,et al.  Why do categories affect stimulus judgment? , 2000, Journal of experimental psychology. General.

[54]  Louis Boves,et al.  Weighting phone confidence measures for automatic speech recognition , 2000 .

[55]  Anne Cutler,et al.  Lexical influence in phonetic decision-making: Evidence from subcategorical mismatches , 1999 .

[56]  P. Luce,et al.  Probabilistic Phonotactics and Neighborhood Activation in Spoken Word Recognition , 1999 .

[57]  Ulrich H. Frauenfelder,et al.  The Recognition of Spoken Words , 1999 .

[58]  J. Werker,et al.  Influences on infant speech processing: toward a new synthesis. , 1999, Annual review of psychology.

[59]  P. Friederici Language Comprehension: A Biological Perspective , 1999, Springer Berlin Heidelberg.

[60]  Anne Cutler,et al.  Spotting (different types of) words in (different types of) context , 1998, ICSLP.

[61]  M. Pitt,et al.  Is Compensation for Coarticulation Mediated by the Lexicon , 1998 .

[62]  J. McQueen Segmentation of Continuous Speech Using Phonotactics , 1998 .

[63]  P. Luce,et al.  When Words Compete: Levels of Processing in Perception of Spoken Words , 1998 .

[64]  Paul D. Allopenna,et al.  Tracking the Time Course of Spoken Word Recognition Using Eye Movements: Evidence for Continuous Mapping Models , 1998 .

[65]  D. Pisoni,et al.  Recognizing Spoken Words: The Neighborhood Activation Model , 1998, Ear and hearing.

[66]  David Glasspool,et al.  4th Neural Computation and Psychology Workshop, London, 9–11 April 1997 , 1998, Perspectives in Neural Computing.

[67]  D. Norris,et al.  The Possible-Word Constraint in the Segmentation of Continuous Speech , 1997, Cognitive Psychology.

[68]  Dawn G. Blasko,et al.  Similarity Mapping in Spoken Word Recognition , 1997 .

[69]  Kelli J. Johnson The auditory/perceptual basis for speech segmentation , 1997 .

[70]  J. Mullennix,et al.  Talker Variability in Speech Processing , 1997 .

[71]  William D. Marslen-Wilson,et al.  Recognising Embedded Words in Connected Speech: Context and Competition , 1997, NCPW.

[72]  W. Marslen-Wilson,et al.  Perceptual distance and competition in lexical access. , 1996, Journal of experimental psychology. Human perception and performance.

[73]  Cristina Burani,et al.  Word Identification in Fluent Speech , 1995 .

[74]  Ted Briscoe,et al.  Models of continuous speech recognition and the contents of the vocabulary , 1995 .

[75]  D W Massaro,et al.  Independence of lexical context and phonological information in speech perception. , 1995, Journal of experimental psychology. Learning, memory, and cognition.

[76]  J. Vroomen,et al.  Metrical segmentation and lexical inhibition in spoken word recognition , 1995 .

[77]  P Iverson,et al.  Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. , 1995, The Journal of the Acoustical Society of America.

[78]  P C Gordon,et al.  Lexical and prelexical influences on word segmentation: evidence from priming. , 1995, Journal of experimental psychology. Human perception and performance.

[79]  Anne Cutler,et al.  Competition and segmentation in spoken word recognition , 1994, ICSLP.

[80]  W Marslen-Wilson,et al.  Levels of perceptual representation and process in lexical access: words, phonemes, and features. , 1994, Psychological review.

[81]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[82]  S. Blumstein,et al.  The effect of subphonetic differences on lexical access , 1994, Cognition.

[83]  D. Norris,et al.  Competition in spoken word recognition: Spotting words in other words , 1994 .

[84]  Dawn G. Blasko,et al.  Do the Beginnings of Spoken Words Have a Special Status in Auditory Word Recognition , 1993 .

[85]  A. Cutler,et al.  Rhythmic cues to speech segmentation: Evidence from juncture misperception , 1992 .

[86]  K. Stevens,et al.  Linguistic experience alters phonetic perception in infants by 6 months of age. , 1992, Science.

[87]  D. Massaro,et al.  Integration versus interactive activation: The joint influence of stimulus and context in perception , 1991, Cognitive Psychology.

[88]  Dawn G. Blasko,et al.  Effects of subsequent sentence context in auditory word recognition: Temporal and linguistic constrainst , 1991 .

[89]  James L. McClelland Stochastic interactive processes and the effect of context on perception , 1991, Cognitive Psychology.

[90]  J. Mullennix,et al.  Word familiarity and frequency in visual and auditory word recognition. , 1990, Journal of experimental psychology. Learning, memory, and cognition.

[91]  Michael Garman,et al.  Psycholinguistics: Accessing the mental lexicon , 1990 .

[92]  John R. Anderson The Adaptive Character of Thought , 1990 .

[93]  D. Massaro,et al.  Models of integration given multiple sources of information. , 1990, Psychological review.

[94]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[95]  D. Massaro Testing between the TRACE model and the fuzzy logical model of speech perception , 1989, Cognitive Psychology.

[96]  Dominic W. Massaro,et al.  Experimental Psychology: An Information Processing Approach , 1989 .

[97]  R. Shillcock,et al.  The recognition of words after their acoustic offsets in spontaneous speech: Effects of subsequent context , 1988, Perception & psychophysics.

[98]  Anne Cutler,et al.  The role of strong syllables in segmentation for lexical access , 1988 .

[99]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: part 1.: an account of basic findings , 1988 .

[100]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[101]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[102]  Anne Cutler,et al.  The predominance of strong initial syllables in the English vocabulary , 1987 .

[103]  Kenneth Ward Church,et al.  Phonological parsing and lexical retrieval , 1987, Cognition.

[104]  W. Marslen-Wilson Functional parallelism in spoken word-recognition , 1987, Cognition.

[105]  J. Perkell,et al.  Invariance and variability in speech processes , 1987 .

[106]  P. Luce,et al.  A computational analysis of uniqueness points in auditory word recognition , 1986, Perception & psychophysics.

[107]  M. Taft,et al.  Exploring the cohort model of spoken word recognition , 1986, Cognition.

[108]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[109]  James L. McClelland,et al.  PDP models and general issues in cognitive science , 1986 .

[110]  Roger K. Moore Computer Speech and Language , 1986 .

[111]  F. Grosjean The recognition of words after their acoustic offset: Evidence and implications , 1985, Perception & psychophysics.

[112]  James L. McClelland,et al.  Levels indeed! A response to Broadbent , 1985 .

[113]  J. H. Bertera,et al.  Latency of sequential eye movements: implications for reading. , 1983, Journal of experimental psychology. Human perception and performance.

[114]  D W Massaro,et al.  American Psychological Association, Inc. Evaluation and Integration of Visual and Auditory Information in Speech Perception , 2022 .

[115]  D. Massaro,et al.  Phonological context in speech perception , 1983, Perception & psychophysics.

[116]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[117]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[118]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[119]  J. Sawusch,et al.  Adaptation and contrast in the perception of voicing. , 1981, Journal of experimental psychology. Human perception and performance.

[120]  G. Underwood Strategies of information processing , 1980 .

[121]  W. Ganong Phonetic categorization in auditory word perception. , 1980, Journal of experimental psychology. Human perception and performance.

[122]  D W Massaro,et al.  Letter information and orthographic context in word perception. , 1979, Journal of experimental psychology. Human perception and performance.

[123]  D. Massaro,et al.  Integration of featural information in speech perception. , 1978, Psychological review.

[124]  C. P. Whaley Word–nonword classification time. , 1978 .

[125]  William D Marslen-Wilson,et al.  Processing interactions and lexical access during word recognition in continuous speech , 1978, Cognitive Psychology.

[126]  Dominic W. Massaro,et al.  A Stage Model of Reading and Listening. , 1978 .

[127]  L. Nakatani,et al.  Locus of segmental cues for word juncture. , 1977, The Journal of the Acoustical Society of America.

[128]  A. Healy,et al.  Units of speech perception: Phoneme and syllable , 1976 .

[129]  M. D. Wang,et al.  Consonant confusions in noise: a study of perceptual features. , 1973, The Journal of the Acoustical Society of America.

[130]  David McNeill,et al.  The Perceptual Reality of Phonemes, Syllables, Words, and Sentences. , 1973 .

[131]  John J. L. Morton,et al.  Interaction of information in word recognition. , 1969 .

[132]  Gerald S. Rogers,et al.  Mathematical Statistics: A Decision Theoretic Approach , 1967 .

[133]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[134]  H. Savin Word‐Frequency Effect and Errors in the Perception of Speech , 1963 .

[135]  C. L. Mallows,et al.  Individual Choice Behaviour. , 1961 .

[136]  Irwin Pollack,et al.  Analysis of Incorrect Responses to an Unknown Message Set , 1960 .

[137]  H. Rubenstein,et al.  Intelligibility of Known and Unknown Message Sets , 1959 .

[138]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[139]  J. M. Pickett,et al.  Perception of Vowels Heard in Noises of Various Spectra , 1957 .

[140]  D. Howes On the Relation between the Intelligibility and Frequency of Occurrence of English Words , 1957 .

[141]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[142]  D. Howes On the interpretation of word frequency as a variable affecting speed of recognition. , 1954, Journal of experimental psychology.