Modeling the interaction of phonemic intelligibility and lexical structure in audiovisual word recognition

Studies of audiovisual perception of spoken language have mostly modeled phoneme identification in nonsense syllables, but it is doubtful that models or theories of phonetic processing can adequately account for audiovisual word recognition. The present study took a computational approach to examine how lexical structure may additionally constrain word recognition, given the phonetic information available under vocoded audio, visual and audiovisual stimulus conditions. Subjects made phonemic identification judgments on recordings of spoken nonsense syllables. Hierarchical cluster analysis was used first to select classes of perceptually equivalent phonemes for each of the stimulus conditions, and then a machine-readable phonemically transcribed lexicon was retranscribed in terms of these phonemic equivalence classes. Several statistics were computed for each of the transcriptions, including percent information extracted, percent words unique and expected class size. The findings suggest that superadditive levels of audiovisual enhancement are more likely for monosyllabic than for multisyllabic words. That is, impoverished phonetic information may be sufficient to recognize multisyllabic words, but the recognition of monosyllabic words seems to require additional phonetic information.

[1]  T. Landauer,et al.  Structural differences between common and rare words: Failure of equivalence assumptions for theories of word recognition , 1973 .

[2]  David Carter,et al.  Lexical stress and lexical discriminability: Stressed syllables are more informative, but why? , 1989 .

[3]  P. Kricos Differences in Visual Intelligibility Across Talkers , 1996 .

[4]  M. Woodward,et al.  Phoneme perception in lipreading. , 1960, Journal of speech and hearing research.

[5]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[6]  Lynne E. Bernstein,et al.  Lipreading supplemented by voice fundamental frequency: to what extent does the addition of voicing increase lexical uniqueness for the lipreader , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  B. Walden,et al.  Effects of training on the visual recognition of consonants. , 1977, Journal of speech and hearing research.

[8]  L. Bernstein,et al.  Single-channel vibrotactile supplements to visual perception of intonation and stress. , 1989, The Journal of the Acoustical Society of America.

[9]  G. Altmann Cognitive models of speech processing , 1991 .

[10]  L. Braida Crossmodal Integration in the Identification of Consonant Segments , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[11]  H. Savin Word‐Frequency Effect and Errors in the Perception of Speech , 1963 .

[12]  P. Luce Neighborhoods of words in the mental lexicon , 1986 .

[13]  M. Aldenderfer Cluster Analysis , 1984 .

[14]  M E Demorest,et al.  Lipreading sentences with vibrotactile vocoders: performance of normal-hearing and hearing-impaired subjects. , 1991, The Journal of the Acoustical Society of America.

[15]  E. Owens,et al.  Visemes observed by hearing-impaired and normal-hearing adult viewers. , 1985, Journal of speech and hearing research.

[16]  A. M. Aull,et al.  Lexical stress determination and its application to large vocabulary speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[18]  D. Reisberg,et al.  Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. , 1987 .

[19]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[20]  Lynne E. Bernstein,et al.  Elucidating the complex relationships between phonetic perception and word recognition in audiovisual speech perception , 1997, AVSP.

[21]  R. Campbell,et al.  Hearing by eye : the psychology of lip-reading , 1988 .

[22]  Victor Zue,et al.  A model of lexical access from partial phonetic information , 1984, ICASSP.

[23]  Ken W. Grant,et al.  Evaluating the articulation index for auditory–visual consonant recognition , 1993 .

[24]  David B. Pisoni,et al.  Speech perception, word recognition and the structure of the lexicon , 1985, Speech Commun..

[25]  S. Lesner Differences in visual intelligibility across talkers , 1982 .

[26]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[27]  P K Kuhl,et al.  The role of visual information in the processing of , 1989, Perception & psychophysics.

[28]  C. G. Fisher,et al.  Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.

[29]  Lynne E. Bernstein,et al.  A comparison of perceptual word similarity metrics , 1997 .

[30]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[31]  A. Maynard Engebertson,et al.  Implementation of a Microprocessor-Based Tactile Hearing Prosthesis , 1986, IEEE Transactions on Biomedical Engineering.

[32]  David B. Pisoni,et al.  Similarity neighborhoods of spoken words , 1991 .

[33]  W. Marslen-Wilson,et al.  The mental representation of lexical form: A phonological approach to the recognition lexicon , 1991, Cognition.

[34]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[35]  L. Bernstein,et al.  Speechreading and the structure of the lexicon: computationally modeling the effects of reduced phonetic distinctiveness on lexical uniqueness. , 1997, The Journal of the Acoustical Society of America.

[36]  David Carter,et al.  An information-theoretic analysis of phonetic dictionary access , 1987 .

[37]  M. J. Norušis,et al.  SPSS professional statistics 6.1 , 1994 .

[38]  Y. Tohkura,et al.  McGurk effect in non-English listeners: few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility. , 1991, The Journal of the Acoustical Society of America.