Multisensory Integration: The Case of a Time Window of Gesture–Speech Integration

This experiment investigates the integration of gesture and speech from a multisensory perspective. In a disambiguation paradigm, participants were presented with short videos of an actress uttering sentences like “She was impressed by the BALL, because the GAME/DANCE….” The ambiguous noun (BALL) was accompanied by an iconic gesture fragment containing information to disambiguate the noun toward its dominant or subordinate meaning. We used four different temporal alignments between noun and gesture fragment: the identification point (IP) of the noun was either prior to (+120 msec), synchronous with (0 msec), or lagging behind the end of the gesture fragment (−200 and −600 msec). ERPs triggered to the IP of the noun showed significant differences for the integration of dominant and subordinate gesture fragments in the −200, 0, and +120 msec conditions. The outcome of this integration was revealed at the target words. These data suggest a time window for direct semantic gesture–speech integration ranging from at least −200 up to +120 msec. Although the −600 msec condition did not show any signs of direct integration at the homonym, significant disambiguation was found at the target word. An explorative analysis suggested that gesture information was directly integrated at the verb, indicating that there are multiple positions in a sentence where direct gesture–speech integration takes place. Ultimately, this would implicate that in natural communication, where a gesture lasts for some time, several aspects of that gesture will have their specific and possibly distinct impact on different positions in an utterance.

[1]  C. Spence,et al.  Audiovisual synchrony perception for speech and music assessed using a temporal order judgment task , 2006, Neuroscience Letters.

[2]  P. Gribble,et al.  Temporal constraints on the McGurk effect , 1996, Perception & psychophysics.

[3]  C. Spence,et al.  The Handbook of Multisensory Processing , 2004 .

[4]  Thomas C. Gunter,et al.  The Role of Iconic Gestures in Speech Disambiguation: ERP Evidence , 2007, Journal of Cognitive Neuroscience.

[5]  Uta Noppeney,et al.  Audiovisual asynchrony detection in human speech. , 2011, Journal of experimental psychology. Human perception and performance.

[6]  M. Wallace,et al.  Individual differences in the multisensory temporal binding window predict susceptibility to audiovisual illusions. , 2012, Journal of experimental psychology. Human perception and performance.

[7]  D. Senkowski,et al.  The multifaceted interplay between attention and multisensory integration , 2010, Trends in Cognitive Sciences.

[8]  S. Geisser,et al.  On methods in the analysis of profile data , 1959 .

[9]  Natalia Trujillo,et al.  Gesture influences the processing of figurative language in non-native speakers: ERP evidence , 2010, Neuroscience Letters.

[10]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[11]  M. Kutas,et al.  Ambiguous words in context: An event-related potential analysis of the time course of meaning activation ☆ ☆☆ , 1987 .

[12]  S. Kelly,et al.  Neural correlates of bimodal speech and gesture comprehension , 2004, Brain and Language.

[13]  C. Spence Prior entry: attention and temporal perception , 2010 .

[14]  Willem J. M. Levelt,et al.  Pointing and voicing in deictic expressions , 1985 .

[15]  C GunterThomas,et al.  The Role of Iconic Gestures in Speech Disambiguation , 2007 .

[16]  N. F. Dixon,et al.  The Detection of Auditory Visual Desynchrony , 1980, Perception.

[17]  Jonathan M. P. Wilbiks,et al.  Effects of temporal asynchrony and stimulus magnitude on competitive audio–visual binding , 2013, Attention, Perception, & Psychophysics.

[18]  J. Holler,et al.  The communicative influence of gesture and action during speech comprehension: gestures have the upper hand , 2012 .

[19]  Christian Obermeier,et al.  A speaker's gesture style can affect language comprehension: ERP evidence from gesture-speech integration. , 2015, Social cognitive and affective neuroscience.

[20]  Shin'ya Nishida,et al.  The sliding window of audio-visual simultaneity. , 2009, Journal of vision.

[21]  Laura L. Namy,et al.  Developmental changes in neural activity to familiar words and gestures , 2007, Brain and Language.

[22]  Angela D. Friederici,et al.  Working Memory and Lexical Ambiguity Resolution as Revealed by ERPs: A Difficult Case for Activation Theories , 2003, Journal of Cognitive Neuroscience.

[23]  Ralf Steinmetz,et al.  Human Perception of Jitter and Media Synchronization , 1996, IEEE J. Sel. Areas Commun..

[24]  Seana Coulson,et al.  Gestures modulate speech processing early in utterances , 2010, Neuroreport.

[25]  E. Maris,et al.  Two Sides of the Same Coin , 2010, Psychological science.

[26]  Zeshu Shao,et al.  The Role of Synchrony and Ambiguity in Speech–Gesture Integration during Comprehension , 2011, Journal of Cognitive Neuroscience.

[27]  C. Spence,et al.  Audiovisual Temporal Integration for Complex Speech, Object-Action, Animal Call, and Musical Stimuli , 2010 .

[28]  M. Wallace,et al.  Multisensory temporal integration: task and stimulus dependencies , 2013, Experimental Brain Research.

[29]  Angela D. Friederici,et al.  Gesture Facilitates the Syntactic Analysis of Speech , 2012, Front. Psychology.

[30]  Kara D. Federmeier,et al.  Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). , 2011, Annual review of psychology.

[31]  F. Pollick,et al.  When knowing can replace seeing in audiovisual integration of actions , 2009, Cognition.

[32]  R. Krauss,et al.  Word Familiarity Predicts Temporal Asynchrony of Hand Gestures and Speech , 2010 .

[33]  D. Loehr Aspects of rhythm in gesture and speech , 2007 .

[34]  C. Spence,et al.  On measuring selective attention to an expected sensory modality , 1997, Perception & psychophysics.

[35]  Thomas P. Caudell,et al.  Computational Requirements and Synchronization Issues for Virtual Acoustic Displays , 1998, Presence.

[36]  M. Swerts,et al.  The Effects of Visual Beats on Prosodic Prominence: Acoustic Analyses, Auditory Perception and Visual Perception. , 2007 .

[37]  Sidney S. Simon,et al.  Merging of the Senses , 2008, Front. Neurosci..

[38]  Sotaro Kita,et al.  How representational gestures help speaking , 2000 .

[39]  C. Spence,et al.  When hearing the bark helps to identify the dog: Semantically-congruent sounds modulate the identification of masked pictures , 2010, Cognition.

[40]  C. Spence,et al.  Audiovisual temporal order judgments , 2003, Experimental Brain Research.

[41]  Ophelia Deroy,et al.  How automatic are crossmodal correspondences? , 2013, Consciousness and Cognition.

[42]  R. C. Oldfield The assessment and analysis of handedness: the Edinburgh inventory. , 1971, Neuropsychologia.

[43]  Charles Spence,et al.  Temporal recalibration during asynchronous audiovisual speech perception , 2007, Experimental Brain Research.

[44]  Charles Spence,et al.  Multisensory temporal order judgments: the role of hemispheric redundancy. , 2003, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[45]  P. Bertelson,et al.  Multisensory integration, perception and ecological validity , 2003, Trends in Cognitive Sciences.

[46]  J. Vroomen,et al.  Intersensory binding across space and time: A tutorial review , 2013, Attention, Perception, & Psychophysics.

[47]  J. D. Ruiter The production of gesture and speech , 2000 .

[48]  Charles Spence,et al.  Evaluating the influence of frame rate on the temporal aspects of audiovisual speech perception , 2006, Neuroscience Letters.

[49]  Fred Cummins,et al.  The temporal relation between beat gestures and speech , 2011 .

[50]  C. Spence,et al.  Crossmodal binding: Evaluating the “unity assumption” using audiovisual speech stimuli , 2007, Perception & psychophysics.

[51]  D. Poeppel,et al.  Temporal window of integration in auditory-visual speech perception , 2007, Neuropsychologia.

[52]  David Poeppel,et al.  Detection of auditory (cross-spectral) and auditory-visual (cross-modal) synchrony , 2004, Speech Commun..

[53]  M. Wallace,et al.  The effects of visual training on multisensory temporal processing , 2013, Experimental Brain Research.

[54]  Thomas C. Gunter,et al.  The benefit of gestures during communication: Evidence from hearing and hearing-impaired individuals , 2012, Cortex.

[55]  Franco Simonetti,et al.  Gesture and metaphor comprehension: Electrophysiological evidence of cross-modal coordination by audiovisual stimulation , 2009, Brain and Cognition.

[56]  M. Studdert-Kennedy Hand and Mind: What Gestures Reveal About Thought. , 1994 .

[57]  Ying Choon Wu,et al.  Meaningful gestures: electrophysiological indices of iconic gesture comprehension. , 2005, Psychophysiology.

[58]  Facundo Manes,et al.  High contextual sensitivity of metaphorical expressions and gesture blending: A video event-related potential design , 2011, Psychiatry Research: Neuroimaging.

[59]  C. Spence Audiovisual multisensory integration , 2007 .

[60]  D. Swinney Lexical access during sentence comprehension: (Re)consideration of context effects , 1979 .

[61]  Ernst Pöppel,et al.  Sensory integration within temporally neutral systems states: A hypothesis , 1990, Naturwissenschaften.

[62]  James Bartolotti,et al.  An intentional stance modulates the integration of gesture and speech during comprehension , 2007, Brain and Language.

[63]  Q. Summerfield,et al.  Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. , 1985, The Journal of the Acoustical Society of America.

[64]  C. Spence,et al.  Evaluating the influence of the 'unity assumption' on the temporal perception of realistic audiovisual stimuli. , 2008, Acta psychologica.

[65]  I. Hirsh,et al.  Perceived order in different sense modalities. , 1961, Journal of experimental psychology.

[66]  Ying Choon Wu,et al.  How iconic gestures enhance communication: An ERP study , 2007, Brain and Language.

[67]  Michael Peter Hollier,et al.  An Experimental Investigation into Multi-Modal Synchronization Sensitivity for Perceptual Model Development , 1998 .

[68]  Ying Choon Wu,et al.  Are depictive gestures like pictures? Commonalities and differences in semantic processing , 2011, Brain and Language.

[69]  C. Spence,et al.  Audiovisual synchrony perception for music, speech, and object actions , 2006, Brain Research.

[70]  Sotaro Kita,et al.  On-line Integration of Semantic Information from Speech and Gesture: Insights from Event-related Brain Potentials , 2007, Journal of Cognitive Neuroscience.

[71]  James Bartolotti,et al.  Integrating Speech and Iconic Gestures in a Stroop-like Task: Evidence for Automatic Processing , 2010, Journal of Cognitive Neuroscience.

[72]  Rainer Goebel,et al.  Top–down task effects overrule automatic multisensory responses to letter–sound pairs in auditory association cortex , 2006, NeuroImage.

[73]  Thomas C. Gunter,et al.  What Iconic Gesture Fragments Reveal about Gesture–Speech Integration: When Synchrony Is Lost, Memory Can Help , 2011, Journal of Cognitive Neuroscience.

[74]  Paul Jaak Treffner,et al.  Gestures and Phases: The Dynamics of Speech-Hand Communication , 2008 .

[75]  Alan Kingstone,et al.  The ventriloquist in motion: illusory capture of dynamic information across sensory modalities. , 2002, Brain research. Cognitive brain research.

[76]  D W Massaro,et al.  Perception of asynchronous and conflicting visual and auditory speech. , 1996, The Journal of the Acoustical Society of America.

[77]  F Grosjean,et al.  Spoken word recognition processes and the gating paradigm , 1980, Perception & psychophysics.