Audiovisual Speech Scene Analysis in the Context of Competing Sources

Audiovisual fusion in speech perception is generally conceived as a process independent from scene analysis, which is supposed to occur separately in the auditory and visual domain. On the contrary, we have been proposing in the last years that scene analysis such as what takes place in the cocktail party effect was an audiovisual process. We review here a series of experiments illustrating how audiovisual speech scene analysis occurs in the context of competing sources. Indeed, we show that a short contextual audiovisual stimulus made of competing auditory and visual sources modifies the perception of a following McGurk target. We interpret this in terms of binding, unbinding and rebinding processes, and we show how these processes depend on audiovisual correlations in time, attentional processes and differences between junior and senior participants.

[1]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[2]  Torsten Rahne,et al.  Visual cues can modulate integration and segregation of objects in auditory scene analysis , 2007, Brain Research.

[3]  Adrian K. C. Lee,et al.  Auditory selective attention is enhanced by a task-irrelevant temporally coherent visual stimulus in human listeners , 2015, eLife.

[4]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[5]  Frédéric Berthommier,et al.  A phonetically neutral model of the low-level audio-visual interaction , 2004, Speech Commun..

[6]  Charles Spence,et al.  Intramodal perceptual grouping modulates multisensory integration: evidence from the crossmodal dynamic capture task , 2005, Neuroscience Letters.

[7]  Frédéric Berthommier,et al.  Audio Visual Integration with Competing Sources in the Framework of Audio Visual Speech Scene Analysis. , 2016, Advances in experimental medicine and biology.

[8]  Frédéric Berthommier,et al.  Binding and unbinding the auditory and visual streams in the McGurk effect. , 2012, The Journal of the Acoustical Society of America.

[9]  Frédéric Berthommier,et al.  Effect of context, rebinding and noise, on audiovisual speech fusion , 2013, INTERSPEECH.

[10]  K. G. Munhall,et al.  Audiovisual Integration of Speech in a Bistable Illusion , 2009, Current Biology.

[11]  Jeremy Marozeau,et al.  The Effect of Visual Cues on Auditory Stream Segregation in Musicians and Non-Musicians , 2010, PloS one.

[12]  Salvador Soto-Faraco,et al.  Attention to touch weakens audiovisual speech integration , 2007, Experimental Brain Research.

[13]  J. Schwartz A reanalysis of McGurk data suggests that audiovisual fusion in speech perception is subject-dependent. , 2010, The Journal of the Acoustical Society of America.

[14]  R. Campbell,et al.  Audiovisual Integration of Speech Falters under High Attention Demands , 2005, Current Biology.

[15]  Frédéric Berthommier,et al.  The effect of lip-reading on primary stream segregation. , 2011, The Journal of the Acoustical Society of America.

[16]  Salvador Soto-Faraco,et al.  Searching for audiovisual correspondence in multiple speaker scenarios , 2011, Experimental Brain Research.

[17]  Mikko Sams,et al.  The role of visual spatial attention in audiovisual speech perception , 2009, Speech Commun..

[18]  Frédéric Berthommier,et al.  Audiovisual streaming in voicing perception: new evidence for a low-level interaction between audio and visual modalities , 2011, AVSP.

[19]  K. Sekiyama,et al.  Enhanced audiovisual integration with aging in speech perception: a heightened McGurk effect in older adults , 2014, Front. Psychol..

[20]  Martin Heckmann,et al.  Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[21]  Q. Summerfield Some preliminaries to a comprehensive account of audio-visual speech perception. , 1987 .

[22]  Jean Vroomen,et al.  Auditory grouping occurs prior to intersensory pairing: evidence from temporal ventriloquism , 2007, Experimental Brain Research.

[23]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[24]  Jennifer L. Mozolic,et al.  Multisensory Integration and Aging , 2012 .

[25]  Jacqueline Leybaert,et al.  Degradation of Labial Information Modifies Audiovisual Speech Perception in Cochlear-Implanted Children , 2013, Ear and hearing.

[26]  Frédéric Berthommier,et al.  Audio-visual speech scene analysis: characterization of the dynamics of unbinding and rebinding the McGurk effect. , 2015, The Journal of the Acoustical Society of America.