Temporal integration for live conversational speech

The difficulty in detecting short asynchronies between corresponding audio and video signals demonstrates the remarkable resilience of the perceptual system when integrating the senses. Thresholds for perceived synchrony vary depending on the complexity, congruency and predictability of the audiovisual event. For instance, asynchrony is typically detected sooner for simple flash and tone combinations than for speech stimuli. In applied scenarios, such as teleconference platforms, the thresholds themselves are of particular interest; since the transmission of audio and video streams can result in temporal misalignments, system providers need to establish how much delay they can allow. This study compares the perception of synchrony in speech for a live two-way teleconference scenario and a controlled experimental set-up. Although methodologies and measures differ, our explorative analysis indicates that the windows of temporal integration are similar for the two scenarios. Nevertheless, the direction of temporal tolerance differs; for the teleconference, audio lead asynchrony was more difficult to detect than for the experimental speech videos. While the windows of temporal integration are fairly independent of the context, the skew in the audio lead threshold may be a reflection of the natural diversion of attending to a conversation. Index Terms: audiovisual speech, temporal integration, synchrony perception, teleconference

[1]  David Poeppel,et al.  Discrimination of auditory-visual synchrony , 2003, AVSP.

[2]  Steven Greenberg,et al.  Speech intelligibility derived from asynchronous processing of auditory-visual information , 2001, AVSP.

[3]  M. Press Presence : teleoperators and virtual environments. , 2014 .

[4]  D. Pisoni,et al.  Auditory-visual speech perception and synchrony detection for speech and nonspeech signals. , 2006, The Journal of the Acoustical Society of America.

[5]  J. Juola,et al.  Audiovisual synchrony and temporal order judgments: Effects of experimental method and stimulus type , 2008, Perception & psychophysics.

[6]  C. Spence,et al.  Audiovisual temporal adaptation of speech: temporal order versus simultaneity judgments , 2008, Experimental Brain Research.

[7]  Sharon M. Thomas,et al.  Effects of horizontal viewing angle on visual and audiovisual speech recognition. , 2001, Journal of experimental psychology. Human perception and performance.

[8]  David Poeppel,et al.  Detection of auditory (cross-spectral) and auditory-visual (cross-modal) synchrony , 2004, Speech Commun..

[9]  I. Hirsh,et al.  Perceived order in different sense modalities. , 1961, Journal of experimental psychology.

[10]  N. F. Dixon,et al.  The Detection of Auditory Visual Desynchrony , 1980, Perception.

[11]  C. Spence,et al.  Assessing the effect of physical differences in the articulation of consonants and vowels on audiovisual temporal perception , 2012, Front. Integr. Neurosci..

[12]  J. Vroomen,et al.  Perception of intersensory synchrony: A tutorial review , 2010, Attention, perception & psychophysics.

[13]  F. Pollick,et al.  When knowing can replace seeing in audiovisual integration of actions , 2009, Cognition.

[14]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[15]  Charles Spence,et al.  Perception of audiovisual speech synchrony for native and non-native language , 2010, Brain Research.

[16]  L. Stelmach,et al.  Directed attention and perception of temporal order. , 1991, Journal of experimental psychology. Human perception and performance.

[17]  Uta Noppeney,et al.  Audiovisual asynchrony detection in human speech. , 2011, Journal of experimental psychology. Human perception and performance.

[18]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[19]  Thomas P. Caudell,et al.  Computational Requirements and Synchronization Issues for Virtual Acoustic Displays , 1998, Presence.

[20]  C. Spence,et al.  Audiovisual synchrony perception for music, speech, and object actions , 2006, Brain Research.

[21]  W R Thurlow,et al.  Effects of degree of visual association and angle of displacement on the "ventriloquism" effect. , 1973, Perceptual and motor skills.

[22]  J. Vroomen,et al.  Perception of intersensory synchrony in audiovisual speech: Not that special , 2011, Cognition.

[23]  Jaj Jacques Roufs,et al.  Perception lag as a function of stimulus luminance , 1963 .

[24]  Ralf Steinmetz,et al.  Human Perception of Jitter and Media Synchronization , 1996, IEEE J. Sel. Areas Commun..

[25]  Ragnhild Eg,et al.  Short and sweet, or long and complex? Perceiving temporal synchrony in audiovisual events , 2012 .