Audiovisual robustness: exploring perceptual tolerance to asynchrony and quality distortion

Rules-of-thumb for noticeable and detrimental asynchrony between audio and video streams have long since been established from the contributions of several studies. Although these studies share similar findings, none have made any discernible assumptions regarding audio and video quality. Considering the use of active adaptation in present and upcoming streaming systems, audio and video will continue to be delivered in separate streams; consequently, the assumption that the rules-of-thumb hold independent of quality needs to be challenged. To put this assumption to the test, we focus on the detection, not the appraisal, of asynchrony at different levels of distortion. Cognitive psychologists use the term temporal integration to describe the failure to detect asynchrony. The term refers to a perceptual process with an inherent buffer for short asynchronies, where corresponding auditory and visual signals are merged into one experience. Accordingly, this paper discusses relevant causes and concerns with regards to asynchrony, it introduces research on audiovisual perception, and it moves on to explore the impact of audio and video quality on the temporal integration of different audiovisual events. Three content types are explored, speech from a news broadcast, music presented by a drummer, and physical action in the form of a chess game. Within these contexts, we found temporal integration to be very robust to quality discrepancies between the two modalities. In fact, asynchrony detection thresholds varied considerably more between the different content than they did between distortion levels. Nevertheless, our findings indicate that the assumption concerning the independence of asynchrony and audiovisual quality may have to be reconsidered.

[1]  Magnus Alm,et al.  Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech. , 2013, The Journal of the Acoustical Society of America.

[2]  J. Juola,et al.  Audiovisual synchrony and temporal order judgments: Effects of experimental method and stimulus type , 2008, Perception & psychophysics.

[3]  C. Spence,et al.  Audiovisual temporal adaptation of speech: temporal order versus simultaneity judgments , 2008, Experimental Brain Research.

[4]  P F Seitz,et al.  The use of visible speech cues for improving auditory detection of spoken sentences. , 2000, The Journal of the Acoustical Society of America.

[5]  D. Pisoni,et al.  Auditory-visual speech perception and synchrony detection for speech and nonspeech signals. , 2006, The Journal of the Acoustical Society of America.

[6]  D. C. Howell Statistical Methods for Psychology , 1987 .

[7]  David Poeppel,et al.  Discrimination of auditory-visual synchrony , 2003, AVSP.

[8]  W R Thurlow,et al.  Effects of degree of visual association and angle of displacement on the "ventriloquism" effect. , 1973, Perceptual and motor skills.

[9]  J MacDonald,et al.  Hearing by Eye: How Much Spatial Degradation can Be Tolerated? , 2000, Perception.

[10]  David Alais,et al.  Perceptual synchrony of audiovisual streams for natural and artificial motion sequences. , 2006, Journal of vision.

[11]  Stephen R. Gulliver,et al.  Defining user perception of distributed multimedia quality , 2006, TOMCCAP.

[12]  Charles Spence,et al.  Perception of audiovisual speech synchrony for native and non-native language , 2010, Brain Research.

[13]  Joachim Zeiss,et al.  Evaluating two approaches for browser-based real-time multimedia communication , 2012, MoMM '12.

[14]  M. Angela Sasse,et al.  Listen to Your Heart Rate: Counting the Cost of Media Quality , 1999, IWAI.

[15]  V. Bruce,et al.  Face Recognition in Poor-Quality Video: Evidence From Security Surveillance , 1999 .

[16]  Q. Summerfield,et al.  Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. , 1985, The Journal of the Acoustical Society of America.

[17]  Jonathan Walpole,et al.  The Case for Streaming Multimedia with TCP , 2001, IDMS.

[18]  N. F. Dixon,et al.  The Detection of Auditory Visual Desynchrony , 1980, Perception.

[19]  Herman Aguinis,et al.  Cautionary Note on Reporting Eta-Squared Values from Multifactor ANOVA Designs , 2004 .

[20]  C. Spence,et al.  Audiovisual synchrony perception for speech and music assessed using a temporal order judgment task , 2006, Neuroscience Letters.

[21]  Ahmed Karmouch,et al.  Multimedia teleorchestra with independent sources: Part 2 — synchronization algorithms , 1994, Multimedia Systems.

[22]  Jeroen J. A. van Boxtel,et al.  Retinotopic and non-retinotopic stimulus encoding in binocular rivalry and the involvement of feedback. , 2008, Journal of vision.

[23]  Ragnhild Eg,et al.  Distorted visual information influences audiovisual perception of voicing , 2009, INTERSPEECH.

[24]  D. Poeppel,et al.  Temporal window of integration in auditory-visual speech perception , 2007, Neuropsychologia.

[25]  N. Bolognini,et al.  “Acoustical vision” of below threshold stimuli: interaction among spatially converging audiovisual inputs , 2004, Experimental Brain Research.

[26]  Thomas P. Caudell,et al.  Computational Requirements and Synchronization Issues for Virtual Acoustic Displays , 1998, Presence.

[27]  David Kirsh,et al.  Compensating for low frame rates , 2005, CHI EA '05.

[28]  Charles Spence,et al.  Evaluating the influence of frame rate on the temporal aspects of audiovisual speech perception , 2006, Neuroscience Letters.

[29]  M. Angela Sasse,et al.  Do Users Always Know What's Good For Them? Utilising Physiological Responses to Assess Media Quality , 2000, BCS HCI.

[30]  D. Lewkowicz Perception of auditory-visual temporal synchrony in human infants. , 1996, Journal of experimental psychology. Human perception and performance.

[31]  J. Koenderink,et al.  A new twist to the “shading cue” , 2010 .

[32]  Ralf Steinmetz,et al.  Human Perception of Jitter and Media Synchronization , 1996, IEEE J. Sel. Areas Commun..

[33]  E. Saltzman,et al.  Seeing what you hear: Visual feedback improves pitch recognition , 2010 .

[34]  David R. Badcock,et al.  Predictability affects the perception of audiovisual synchrony in complex sequences , 2011, Attention, perception & psychophysics.

[35]  Thomas Stockhammer,et al.  Dynamic adaptive streaming over HTTP --: standards and design principles , 2011, MMSys.

[36]  A. Marcel Conscious and unconscious perception: Experiments on visual masking and word recognition , 1983, Cognitive Psychology.

[37]  Asif A Ghazanfar,et al.  Facilitation of multisensory integration by the "unity effect" reveals that speech is special. , 2008, Journal of vision.

[38]  Chi-Min Liu,et al.  Compression Artifacts in Perceptual Audio Coding , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  J. Vroomen,et al.  Perception of intersensory synchrony: A tutorial review , 2010, Attention, perception & psychophysics.

[40]  Charles Spence,et al.  Multisensory cues capture spatial attention regardless of perceptual load. , 2007, Journal of experimental psychology. Human perception and performance.

[41]  John J. Foxe,et al.  Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. , 2006, Cerebral cortex.

[42]  Carsten Griwodz,et al.  TCP mechanisms for improving the user experience for time-dependent thin-stream applications , 2008, 2008 33rd IEEE Conference on Local Computer Networks (LCN).

[43]  M. G. Michalos,et al.  Dynamic Adaptive Streaming over HTTP , 2012 .

[44]  C. Spence,et al.  Audiovisual synchrony perception for music, speech, and object actions , 2006, Brain Research.

[45]  Barbara Dodd,et al.  The Role of Vision in the Perception of Speech , 1977, Perception.