How different kinds of sound in videos can influence gaze

This paper presents an analysis of the effect of thirteen different kinds of sound on visual gaze when looking freely at videos to help to predict eye positions. First, an audio-visual experiment was designed with two groups of participants, with audio-visual (AV) and visual (V) conditions, to test the sound effect. Then, an audio experiment was designed to validate the classification of sound we proposed. We observed that the sound effect is different depending on the kind of sound, and that the classes with human voice (speech, singer, human noise and singers) have the greatest effect. Finally, a comparison of eye positions with a visual saliency model was carried out, which proves that adding sound to video decreases the accuracy of prediction of the visual saliency model.

[1]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[2]  Touradj Ebrahimi,et al.  Subjective Quality Evaluation of Foveated Video Coding Using Audio-Visual Focus of Attention , 2011, IEEE Journal of Selected Topics in Signal Processing.

[3]  Maria E. Niessen,et al.  Disambiguating Sounds through Context , 2008, 2008 IEEE International Conference on Semantic Computing.

[4]  Eugene S. Edgington,et al.  Randomization Tests , 2011, International Encyclopedia of Statistical Science.

[5]  Jean-Philippe Thiran,et al.  Extraction of Audio Features Specific to Speech Production for Multimodal Speaker Detection , 2008, IEEE Transactions on Multimedia.

[6]  Denis Pellerin,et al.  Sound effect on visual gaze when looking at videos , 2011, 2011 19th European Signal Processing Conference.

[7]  P. König,et al.  Audio-visual integration during overt visual attention , 2008 .

[8]  Maria E. Niessen,et al.  Disambiguating Sound through Context , 2008, Int. J. Semantic Comput..

[9]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.