Audio-visual speech scene analysis: characterization of the dynamics of unbinding and rebinding the McGurk effect.

While audiovisual interactions in speech perception have long been considered as automatic, recent data suggest that this is not the case. In a previous study, Nahorna et al. [(2012). J. Acoust. Soc. Am. 132, 1061-1077] showed that the McGurk effect is reduced by a previous incoherent audiovisual context. This was interpreted as showing the existence of an audiovisual binding stage controlling the fusion process. Incoherence would produce unbinding and decrease the weight of the visual input in fusion. The present paper explores the audiovisual binding system to characterize its dynamics. A first experiment assesses the dynamics of unbinding, and shows that it is rapid: An incoherent context less than 0.5 s long (typically one syllable) suffices to produce a maximal reduction in the McGurk effect. A second experiment tests the rebinding process, by presenting a short period of either coherent material or silence after the incoherent unbinding context. Coherence provides rebinding, with a recovery of the McGurk effect, while silence provides no rebinding and hence freezes the unbinding process. These experiments are interpreted in the framework of an audiovisual speech scene analysis process assessing the perceptual organization of an audiovisual speech input before decision takes place at a higher processing stage.

[1]  R. Chartier Binding and unbinding , 2017 .

[2]  Kevin G. Munhall,et al.  Detection of Audiovisual Speech Correspondences Without Visual Awareness , 2013, Psychological science.

[3]  Jacqueline Leybaert,et al.  Degradation of Labial Information Modifies Audiovisual Speech Perception in Cochlear-Implanted Children , 2013, Ear and hearing.

[4]  Frédéric Berthommier,et al.  Binding and unbinding the auditory and visual streams in the McGurk effect. , 2012, The Journal of the Acoustical Society of America.

[5]  Daniel Pressnitzer,et al.  The initial phase of auditory and visual scene analysis , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[6]  Salvador Soto-Faraco,et al.  Searching for audiovisual correspondence in multiple speaker scenarios , 2011, Experimental Brain Research.

[7]  Tobias S. Andersen,et al.  Multistage audiovisual integration of speech: dissociating identification and detection , 2011, Experimental Brain Research.

[8]  Jean-Luc Schwartz,et al.  Disentangling unisensory from fusion effects in the attentional modulation of Mcgurk effects: a Bayesian modeling study suggests that fusion is attention-dependent , 2010, AVSP.

[9]  U. Noppeney,et al.  Perceptual Decisions Formed by Accumulation of Audiovisual Evidence in Prefrontal Cortex , 2010, The Journal of Neuroscience.

[10]  L. Shams,et al.  Audiovisual integration in high functioning adults with autism , 2010 .

[11]  J. Schwartz A reanalysis of McGurk data suggests that audiovisual fusion in speech perception is subject-dependent. , 2010, The Journal of the Acoustical Society of America.

[12]  Luc H. Arnal,et al.  Dual Neural Routing of Visual Facilitation in Speech Processing , 2009, The Journal of Neuroscience.

[13]  Angela J. Yu,et al.  Dynamics of attentional selection under conflict: toward a rational Bayesian account. , 2009, Journal of experimental psychology. Human perception and performance.

[14]  S. Soto-Faraco,et al.  Deconstructing the McGurk-MacDonald illusion. , 2009, Journal of experimental psychology. Human perception and performance.

[15]  Mikko Sams,et al.  The role of visual spatial attention in audiovisual speech perception , 2009, Speech Commun..

[16]  L. Bernstein,et al.  Quantified acoustic–optical speech signal incongruity identifies cortical sites of audiovisual speech processing , 2008, Brain Research.

[17]  D. Burnham,et al.  Impact of language on development of auditory-visual speech perception. , 2008, Developmental science.

[18]  D. Poeppel,et al.  Temporal window of integration in auditory-visual speech perception , 2007, Neuropsychologia.

[19]  Salvador Soto-Faraco,et al.  Attention to touch weakens audiovisual speech integration , 2007, Experimental Brain Research.

[20]  Salvador Soto-Faraco,et al.  Conscious access to the unisensory components of a cross-modal illusion , 2007, Neuroreport.

[21]  Jean Vroomen,et al.  Auditory grouping occurs prior to intersensory pairing: evidence from temporal ventriloquism , 2007, Experimental Brain Research.

[22]  Jean-Luc Schwartz,et al.  The 0/0 problem in the fuzzy-logical model of perception. , 2006, The Journal of the Acoustical Society of America.

[23]  R. Campbell,et al.  Audiovisual Integration of Speech Falters under High Attention Demands , 2005, Current Biology.

[24]  Charles Spence,et al.  Intramodal perceptual grouping modulates multisensory integration: evidence from the crossmodal dynamic capture task , 2005, Neuroscience Letters.

[25]  David Poeppel,et al.  Visual speech speeds up the neural processing of auditory speech. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Jeesun Kim,et al.  Investigating the audio-visual speech detection advantage , 2004, Speech Commun..

[27]  Christian Jutten,et al.  Developing an audio-visual speech source separation algorithm , 2004, Speech Commun..

[28]  A. Fort,et al.  Bimodal speech: early suppressive visual effects in human auditory cortex , 2004, The European journal of neuroscience.

[29]  Frédéric Berthommier,et al.  A phonetically neutral model of the low-level audio-visual interaction , 2004, Speech Commun..

[30]  J. Schwartz,et al.  Seeing to hear better: evidence for early audio-visual interactions in speech identification , 2004, Cognition.

[31]  J. Navarra,et al.  Assessing automaticity in audiovisual speech integration: evidence from the speeded classification task , 2004, Cognition.

[32]  Tobias S. Andersen,et al.  Visual attention modulates audiovisual speech perception , 2004 .

[33]  Philip L. Smith,et al.  Psychology and neurobiology of simple decisions , 2004, Trends in Neurosciences.

[34]  P. Bertelson,et al.  Visual Recalibration of Auditory Speech Identification , 2003, Psychological science.

[35]  Jeesun Kim,et al.  Hearing Foreign Voices: Does Knowing What is Said Affect Visual-Masked-Speech Detection? , 2003, Perception.

[36]  P. Deltenre,et al.  Mismatch negativity evoked by the McGurk–MacDonald effect: a phonetic representation within short-term memory , 2002, Clinical Neurophysiology.

[37]  P F Seitz,et al.  The use of visible speech cues for improving auditory detection of spoken sentences. , 2000, The Journal of the Acoustical Society of America.

[38]  Geng-Sheng Kuo,et al.  A new generalized mechanism of secure internetworked information service creation for future personal communication networks .II , 1994, Proceedings of ICCS '94.

[39]  C. Benoît,et al.  Effects of phonetic context on audio-visual intelligibility of French. , 1994, Journal of speech and hearing research.

[40]  Y. Tohkura,et al.  Inter-language differences in the influence of visual cues in speech perception. , 1993 .

[41]  Antoinette T. Gesi,et al.  Bimodal speech perception: an examination across languages , 1993 .

[42]  A. Meltzoff,et al.  Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect , 1991, Perception & psychophysics.

[43]  Y. Tohkura,et al.  McGurk effect in non-English listeners: few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility. , 1991, The Journal of the Acoustical Society of America.

[44]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[45]  Q Summerfield,et al.  Detection and Resolution of Audio-Visual Incompatibility in the Perception of Vowels , 1984, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[46]  D W Massaro,et al.  American Psychological Association, Inc. Evaluation and Integration of Visual and Auditory Information in Speech Perception , 2022 .

[47]  Q. Summerfield,et al.  Detection and resolution of audio‐visual conflict in the perception of vowels , 1982 .

[48]  S. Pinker,et al.  Auditory streaming and the building of timbre. , 1978, Canadian journal of psychology.

[49]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[50]  N. P. Erber Interaction of audition and vision in the recognition of oral speech stimuli. , 1969, Journal of speech and hearing research.

[51]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[52]  Jean Vroomen,et al.  Phonetic recalibration in audiovisual speech , 2012 .

[53]  Mikko Sams,et al.  Sound location can influence audiovisual speech perception when spatial attention is manipulated. , 2011, Seeing and perceiving.

[54]  Slabu,et al.  Audiovisual speech binding: perception and brain activity as a function of synchronicity , 2007 .

[55]  Ilja Frissen,et al.  Visual recalibration of auditory spatial perception , 2005 .

[56]  Salvador Soto-Faraco,et al.  Assessing automaticity in audiovisual integration of speech. id=138 , 2003 .

[57]  Christian Abry,et al.  Asking a naive question about the McGurk effect: Why does audio [b] give more [d] percepts with visual [g] than with visual [d]? , 2001, AVSP.

[58]  Albert S. Bregman,et al.  Auditory Scene Analysis , 2001 .

[59]  Angela Fuster Duran Mcgurk effect in Spanish and German listeners: influences of visual cues in the perception of Spanish and German conflicting audio-visual stimuli , 1995, EUROSPEECH.

[60]  Mohamed Tahar Lallouache,et al.  Un poste "visage-parole" couleur : acquisition et traitement automatique des contours des lèvres , 1991 .

[61]  Q. Summerfield Some preliminaries to a comprehensive account of audio-visual speech perception. , 1987 .

[62]  Jeffrey N. Rouder,et al.  Modeling Response Times for Two-Choice Decisions , 1998 .