Cross-linguistic comparisons in the integration of visual and auditory speech

We examined how speakers of different languages perceive speech in face-to-face communication. These speakers identified synthetic unimodal and bimodal speech syllables made from synthetic auditory and visual five-step /ba/-/da/ continua. In the first experiment, Dutch speakers identified the test syllables as either /ba/ or /da/. To explore the robustness of the results, Dutch and English speakers were given a completely open-ended response task. Tasks in previous studies had always specified a set of alternatives. Similar results were found in the two-alternative and open-ended task. Identification of the speech segments was influenced by both the auditory and the visual sources of information. The results falsified an auditory dominance model (ADM) which assumes that the contribution of visible speech is dependent on poor-quality audible speech. The results also falsified an additive model of perception (AMP) in which the auditory and visual sources are linearly combined. The fuzzy logical model of perception (FLMP) provided a good description of performance, supporting the claim that multiple sources of continuous information are evaluated and integrated in speech perception. These results replicate previous results found with English, Spanish, and Japanese speakers. Although there were significant performance differences, the model analyses indicated no differences in the nature of information processing across language groups. The performance differences across languages were caused by information differences due to different phonologies in Dutch and English. These results suggest that the underlying mechanisms for speech perception are similar across languages.

[1]  Ian Maddieson,et al.  Patterns of sounds , 1986 .

[2]  D. Massaro Ambiguity in perception and experimentation. , 1988, Journal of experimental psychology. General.

[3]  Antoinette T. Gesi,et al.  Bimodal speech perception: an examination across languages , 1993 .

[4]  Brian Wyvill,et al.  Speech and expression: a computer solution to face animation , 1986 .

[5]  P K Kuhl,et al.  The role of visual information in the processing of , 1989, Perception & psychophysics.

[6]  A A Montgomery,et al.  Auditory and visual contributions to the perception of consonants. , 1974, Journal of speech and hearing research.

[7]  D. Massaro,et al.  Before you see it, you see its parts: Evidence for feature encoding and integration in preschool children and adults , 1989, Cognitive Psychology.

[8]  J. Vroomen,et al.  Hearing Voices and Seeing Lips. Investigations in the Psychology of Lipreading , 1992 .

[9]  Frederic I. Parke A model for human faces that allows speech synchronized animation , 1975, Comput. Graph..

[10]  Dominic W. Massaro,et al.  Synthesis of visible speech , 1990 .

[11]  D. Massaro,et al.  Models of integration given multiple sources of information. , 1990, Psychological review.

[12]  Q Summerfield,et al.  Use of Visual Information for Phonetic Perception , 1979, Phonetica.

[13]  D. Massaro,et al.  Perception of Synthesized Audible and Visible Speech , 1990 .

[14]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[15]  J. Cutting,et al.  Selectivity, scope, and simplicity of models: a lesson from fitting judgments of perceived depth. , 1992, Journal of experimental psychology. General.

[16]  Paula M. T. Smeele,et al.  The contribution of vision to speech perception , 1991, EUROSPEECH.

[17]  H. Gouraud Continuous Shading of Curved Surfaces , 1971, IEEE Transactions on Computers.

[18]  Vincent J. van Heuven,et al.  Intelligibility of audio-visually desynchronised speech: asymmetrical effect of phoneme position , 1992, ICSLP.

[19]  Y. Tohkura,et al.  Inter-language differences in the influence of visual cues in speech perception. , 1993 .

[20]  D. Reisberg,et al.  Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. , 1987 .

[21]  Y. Tohkura,et al.  McGurk effect in non-English listeners: few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility. , 1991, The Journal of the Acoustical Society of America.

[22]  Dominic W. Massaro,et al.  Perceiving asynchronous bimodal speech in consonant-vowel and vowel syllables , 1993, Speech Commun..

[23]  M. Studdert-Kennedy READING GESTURES BY LIGHT AND SOUND , 1989 .

[24]  B. MacWhinney,et al.  The Crosslinguistic Study of Sentence Processing. , 1992 .

[25]  D. Massaro,et al.  The paradigm and the fuzzy logical model of perception are alive and well. , 1993, Journal of experimental psychology. General.

[26]  Paul J Sticha,et al.  Evaluation and Integration of Imprecise Information. , 1979 .

[27]  Frederic I. Parke,et al.  A parametric model for human faces. , 1974 .

[28]  Parke,et al.  Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[29]  D W Massaro,et al.  American Psychological Association, Inc. Evaluation and Integration of Visual and Auditory Information in Speech Perception , 2022 .

[30]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[31]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[32]  M. Breeuwer,et al.  Speechreading supplemented with frequency‐selective sound‐pressure information , 1984 .

[33]  J. Platt Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. , 1964, Science.

[34]  R. Campbell,et al.  Hearing by Eye , 1980, The Quarterly journal of experimental psychology.

[35]  D. Massaro Testing between the TRACE model and the fuzzy logical model of speech perception , 1989, Cognitive Psychology.