Visual contribution to the multistable perception of speech

The multistable perception of speech, orverbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker’s articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.

[1]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[2]  D. Reisberg,et al.  Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. , 1987 .

[3]  Jean-Luc Schwartz,et al.  Visual perception of contrastive focus in reiterant French speech , 2004, Speech Commun..

[4]  Mohamed Tahar Lallouache,et al.  Un poste "visage-parole" couleur : acquisition et traitement automatique des contours des lèvres , 1991 .

[5]  Q. Summerfield Audio-visual Speech Perception, Lipreading and Artificial Stimulation , 1983 .

[6]  Jean-Luc Schwartz,et al.  Special session: issues in audiovisual spoken language processing (when, where, and how?) , 2002, INTERSPEECH.

[7]  Donald G. MacKay,et al.  Relations between Word Perception and Production - New Theory and Data on the Verbal Transformation Effect , 1993 .

[8]  Mikko Sams,et al.  Processing of audiovisual speech in Broca's area , 2005, NeuroImage.

[9]  D. Massaro Testing between the TRACE model and the fuzzy logical model of speech perception , 1989, Cognitive Psychology.

[10]  Jeremy I. Skipper,et al.  Seeing Voices : How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception , 2007 .

[11]  Monique Radeau,et al.  VoCoLex : une base de donnes lexicales sur les similarits phonologiques entre les mots franais , 2002 .

[12]  Olivier David,et al.  Waves of consciousness: ongoing cortical patterns during binocular rivalry , 2004, NeuroImage.

[13]  DavisChris,et al.  Repeating and Remembering Foreign Language Words , 2001 .

[14]  Christian Abry,et al.  Multistable representation of speech forms: a functional MRI study of verbal transformations , 2004, NeuroImage.

[15]  N. Logothetis,et al.  Multistable phenomena: changing views in perception , 1999, Trends in Cognitive Sciences.

[16]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[17]  Mikko Sams,et al.  Seeing and hearing others and oneself talk. , 2005, Brain research. Cognitive brain research.

[18]  Jean-Luc Schwartz,et al.  Audiovisual perception of contrastive focus in French , 2003, AVSP.

[19]  D. Poeppel,et al.  Towards a functional neuroanatomy of speech perception , 2000, Trends in Cognitive Sciences.

[20]  R. M. Warren,et al.  An auditory analogue of the visual reversible figure. , 1958, The American journal of psychology.

[21]  J Robert-Ribes,et al.  Complementarity and synergy in bimodal speech: auditory, visual, and audio-visual identification of French oral vowels in noise. , 1998, The Journal of the Acoustical Society of America.

[22]  Mark A Pitt,et al.  Does node stability underlie the verbal transformation effect? A test of node structure theory , 2002, Perception & psychophysics.

[23]  Mark A. Pitt,et al.  Linking Verbal Transformations to Their Causes , 2002 .

[24]  A. Liberman,et al.  The motor theory of speech perception revised , 1985, Cognition.

[25]  K. Green The Use of Auditory and Visual Information in Phonetic Perception , 1996 .

[26]  P. Denes On the Motor Theory of Speech Perception , 1965 .

[27]  David B. Pisoni,et al.  Similarity neighborhoods of spoken words , 1991 .

[28]  A. Macleod,et al.  Quantifying the contribution of vision to speech perception in noise. , 1987, British journal of audiology.

[29]  D. Poeppel,et al.  Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language , 2004, Cognition.

[30]  G. Altmann,et al.  Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives - Workshop Overview , 1989, AI Mag..

[31]  F. X. Castellanos,et al.  Speech-production measures of speech perception: rapid shadowing of VCV syllables. , 1980, The Journal of the Acoustical Society of America.

[32]  Christian Abry,et al.  Linking Dispersion-Focalization Theory and the Maximum Utilization of the Available Distinctive Features Principle in a Perception- for-Action-Control Theory , 2007 .

[33]  Steven L. Small,et al.  Listening to talking faces: motor cortical activation during speech perception , 2005, NeuroImage.

[34]  R. M. Warren,et al.  Illusory changes of distinct speech upon repetition--the verbal transformation effect. , 1961, British journal of psychology.

[35]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[36]  R. Campbell,et al.  Hearing by eye 2 : advances in the psychology of speechreading and auditory-visual speech , 1997 .

[37]  Jeesun Kim,et al.  Repeating and Remembering Foreign Language Words: Does Seeing Help? , 1998, AVSP.

[38]  C. Benoît,et al.  Effects of phonetic context on audio-visual intelligibility of French. , 1994, Journal of speech and hearing research.

[39]  Q. Summerfield Some preliminaries to a comprehensive account of audio-visual speech perception. , 1987 .

[40]  J. D. Smith,et al.  “Enacted” Auditory Images are Ambiguous; “Pure” Auditory Images are Not , 1989, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[41]  Mark A. Pitt,et al.  The source of a lexical bias in the Verbal Transformation Effect , 2001 .

[42]  Christian Abry,et al.  Multistable syllables as enacted percepts: a source of an asymmetric bias in the verbal transformation effect , 2006, Perception & psychophysics.

[43]  David A. Leopold,et al.  Stable perception of visually ambiguous patterns , 2002, Nature Neuroscience.

[44]  R J Porter,et al.  Rapid reproduction of vowel-vowel sequences: evidence for a fast and direct acoustic-motoric linkage in speech. , 1980, Journal of speech and hearing research.

[45]  C. Fowler The perception of phonetic gestures , 1991 .

[46]  J. D. Smith,et al.  The role of subvocalization in auditory imagery , 1995, Neuropsychologia.

[47]  W. Marslen-Wilson Functional parallelism in spoken word-recognition , 1987, Cognition.

[48]  Mikko Sams,et al.  Perception of matching and conflicting audiovisual speech in dyslexic and fluent readers: An fMRI study at 3 T , 2006, NeuroImage.

[49]  Christian Abry,et al.  Linking the Dispersion-Focalization Theory (DFT) and the Maximum Utilization of the Available Distinctive Features (MUAF) principle in a Perception-for-Action-Control Theory (PACT) , 2007 .

[50]  Jeffery A. Jones,et al.  Neural processes underlying perceptual enhancement by visual speech gestures , 2003, Neuroreport.

[51]  L. Bernstein,et al.  Audiovisual Speech Binding: Convergence or Association? , 2004 .