Influence of audio-visual attention on perceived quality of standard definition multimedia content

When human subjects assess the quality of multimedia data, high level perceptual processes such as Focus of Attention (FoA) and eye movements are believed to play an important role in such tasks. While prior art reports incorporation of visual FoA into objective quality metrics, audio-visual FoA has been rarely addressed and utilized in spite of the importance and presence of both audio and video information in many multimedia systems. This paper explores the influence of audio-visual FoA in the perceived quality of standard definition audio-visual sequences. Results of a subjective quality assessment study are reported, where it is shown that the sound source attracts visual attention and thereby the visual degradation in the regions far from the source is less perceived when compared to sound-emitting regions.

[1]  Elisa Drelie Gelasca,et al.  Full-reference objective quality metrics for video watermarking, video segmentation and 3D model watermarking , 2005 .

[2]  Stefan Winkler,et al.  Segmentation-driven perceptual quality metrics , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[3]  D. Tellinghuisen,et al.  The inability to ignore auditory distractors as a function of visual task perceptual load , 2003, Perception & psychophysics.

[4]  J.N. Gowdy,et al.  CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Jack Y. B. Lee On a unified architecture for video-on-demand services , 2002, IEEE Trans. Multim..

[6]  Alan Chalmers,et al.  Detail to Attention: Exploiting Visual Tasks for Selective Rendering , 2003, Rendering Techniques.

[7]  Paolo Napoletano,et al.  Bayesian Integration of Face and Low-Level Cues for Foveated Video Coding , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Alan Chalmers,et al.  The influence of cross-modal interaction on perceived rendering quality thresholds , 2008 .

[9]  Touradj Ebrahimi,et al.  Video coding based on audio-visual attention , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[10]  J. Vroomen,et al.  Sound enhances visual perception: cross-modal effects of auditory organization on vision. , 2000, Journal of experimental psychology. Human perception and performance.

[11]  Neil W. Bergmann,et al.  A technique for image quality assessment based on a human visual system model , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[12]  Kurt Debattista,et al.  Auditory bias of visual attention for perceptually-guided selective rendering of animations , 2005, GRAPHITE '05.

[13]  Marios S. Pattichis,et al.  Foveated video quality assessment , 2002, IEEE Trans. Multim..

[14]  John J. Foxe,et al.  Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. , 2006, Cerebral cortex.

[15]  Michael Elad,et al.  Cross-Modal Localization via Sparsity , 2007, IEEE Transactions on Signal Processing.

[16]  C. Umilta,et al.  How automatic are audiovisual links in exogenous spatial attention? , 2007, Neuropsychologia.

[17]  Anthony J. Maeder,et al.  Automatic identification of perceptually important regions in an image , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[18]  Anthony J. Maeder,et al.  A Computational Model of the Human Visual System forImage Quality , 1997 .

[19]  J. Driver,et al.  Audiovisual links in endogenous covert spatial attention. , 1996, Journal of experimental psychology. Human perception and performance.

[20]  Jean Vroomen,et al.  Perceptual effects of cross-modal stimulation : The cases of ventriloquism and the freezing phenomenon , 2004 .

[21]  J. Driver,et al.  Audiovisual links in exogenous covert spatial orienting , 1997, Perception & psychophysics.

[22]  Diego Gutierrez,et al.  Perceptual rendering of participating media , 2007, TAP.

[23]  E. Macaluso,et al.  Multisensory spatial interactions: a window onto functional integration in the human brain , 2005, Trends in Neurosciences.

[24]  Touradj Ebrahimi,et al.  Semantic video analysis for adaptive content delivery and automatic description , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  C. Spence,et al.  Attention and the crossmodal construction of space , 1998, Trends in Cognitive Sciences.

[26]  Vladimir Pavlovic,et al.  Toward multimodal human-computer interface , 1998, Proc. IEEE.

[27]  Sugato Chakravarty,et al.  Methodology for the subjective assessment of the quality of television pictures , 1995 .

[28]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[29]  Frank M. Ciaramello,et al.  "Can you see me now?" An objective metric for predicting intelligibility of compressed American Sign Language video , 2007, Electronic Imaging.