Audio Visual Integration with Competing Sources in the Framework of Audio Visual Speech Scene Analysis.

We introduce "Audio-Visual Speech Scene Analysis" (AVSSA) as an extension of the two-stage Auditory Scene Analysis model towards audiovisual scenes made of mixtures of speakers. AVSSA assumes that a coherence index between the auditory and the visual input is computed prior to audiovisual fusion, enabling to determine whether the sensory inputs should be bound together. Previous experiments on the modulation of the McGurk effect by audiovisual coherent vs. incoherent contexts presented before the McGurk target have provided experimental evidence supporting AVSSA. Indeed, incoherent contexts appear to decrease the McGurk effect, suggesting that they produce lower audiovisual coherence hence less audiovisual fusion. The present experiments extend the AVSSA paradigm by creating contexts made of competing audiovisual sources and measuring their effect on McGurk targets. The competing audiovisual sources have respectively a high and a low audiovisual coherence (that is, large vs. small audiovisual comodulations in time). The first experiment involves contexts made of two auditory sources and one video source associated to either the first or the second audio source. It appears that the McGurk effect is smaller after the context made of the visual source associated to the auditory source with less audiovisual coherence. In the second experiment with the same stimuli, the participants are asked to attend to either one or the other source. The data show that the modulation of fusion depends on the attentional focus. Altogether, these two experiments shed light on audiovisual binding, the AVSSA process and the role of attention.

[1]  N. P. Erber Interaction of audition and vision in the recognition of oral speech stimuli. , 1969, Journal of speech and hearing research.

[2]  Jeesun Kim,et al.  Investigating the audio-visual speech detection advantage , 2004, Speech Commun..

[3]  Albert S. Bregman Auditory scene analysis as a system , 2008 .

[4]  Frédéric Berthommier,et al.  Audio-visual speech scene analysis: characterization of the dynamics of unbinding and rebinding the McGurk effect. , 2015, The Journal of the Acoustical Society of America.

[5]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[6]  Torsten Rahne,et al.  Visual cues can modulate integration and segregation of objects in auditory scene analysis , 2007, Brain Research.

[7]  Charles Spence,et al.  Intramodal perceptual grouping modulates multisensory integration: evidence from the crossmodal dynamic capture task , 2005, Neuroscience Letters.

[8]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[9]  Frédéric Berthommier,et al.  A phonetically neutral model of the low-level audio-visual interaction , 2004, Speech Commun..

[10]  Tobias S. Andersen,et al.  Visual attention modulates audiovisual speech perception , 2004 .

[11]  J. Schwartz,et al.  Seeing to hear better: evidence for early audio-visual interactions in speech identification , 2004, Cognition.

[12]  K. Tiippana What is the McGurk effect? , 2014, Front. Psychol..

[13]  Frédéric Berthommier,et al.  Effect of context, rebinding and noise, on audiovisual speech fusion , 2013, INTERSPEECH.

[14]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[15]  Salvador Soto-Faraco,et al.  Attention to touch weakens audiovisual speech integration , 2007, Experimental Brain Research.

[16]  Salvador Soto-Faraco,et al.  Assessing the role of attention in the audiovisual integration of speech , 2010, Inf. Fusion.

[17]  J. Schwartz,et al.  A possible neurophysiological correlate of audiovisual binding and unbinding in speech perception , 2014, Front. Psychol..

[18]  A. Bregman Auditory Scene Analysis , 2008 .

[19]  Jeremy Marozeau,et al.  The Effect of Visual Cues on Auditory Stream Segregation in Musicians and Non-Musicians , 2010, PloS one.

[20]  Frédéric Berthommier,et al.  Binding and unbinding the auditory and visual streams in the McGurk effect. , 2012, The Journal of the Acoustical Society of America.

[21]  Frédéric Berthommier,et al.  The effect of lip-reading on primary stream segregation. , 2011, The Journal of the Acoustical Society of America.

[22]  Kevin G. Munhall,et al.  Detection of Audiovisual Speech Correspondences Without Visual Awareness , 2013, Psychological science.

[23]  P F Seitz,et al.  The use of visible speech cues for improving auditory detection of spoken sentences. , 2000, The Journal of the Acoustical Society of America.

[24]  M. Sams,et al.  Effect of attentional load on audiovisual speech perception: evidence from ERPs , 2014, Front. Psychol..