Different types of sounds influence gaze differently in videos

This paper presents an analysis of the effect of different types of sounds on visual gaze when a person is looking freely at videos, which would be helpful to predict eye position. In order to test the effect of sound, an audio-visual experiment was designed with two groups of participants, with audio-visual (AV) and visual (V) conditions. By using statistical tools, we analyzed the difference between eye position of participants with AV and V conditions. We observed that the effect of sound is different depending on the kind of sound, and that the classes with human voice (i.e. speech, singer, human noise and singers) have the greatest effect. Furthermore, the results of the distance between sound source and eye position of the group with AV condition, suggested that only particular types of sound attract human eye position to the sound source. Finally, an analysis of the fixation duration between AV and V conditions showed that participants with AV condition move eyes more frequently than those with V condition.

[1]  C. Spence,et al.  The co-occurrence of multisensory competition and facilitation. , 2008, Acta psychologica.

[2]  C. Spence,et al.  Attentional capture in serial audiovisual search tasks , 2007, Perception & psychophysics.

[3]  Antoine Coutrot,et al.  Influence of soundtrack on eye movements during video exploration , 2012 .

[4]  Dominic W. Massaro,et al.  Development and experimentation with synthetic visible speech , 1994 .

[5]  Anna S. Law,et al.  Attention capture by faces , 2008, Cognition.

[6]  Maria E. Niessen,et al.  Disambiguating Sound through Context , 2008, Int. J. Semantic Comput..

[7]  Katherine M. Armstrong,et al.  Visual and oculomotor selection: links, causes and implications for spatial attention , 2006, Trends in Cognitive Sciences.

[8]  Zhuanghua Shi,et al.  Non-spatial sounds regulate eye movements and enhance visual search. , 2012, Journal of vision.

[9]  Thomas Martinetz,et al.  Variability of eye movements when viewing dynamic natural scenes. , 2010, Journal of vision.

[10]  Touradj Ebrahimi,et al.  Subjective Quality Evaluation of Foveated Video Coding Using Audio-Visual Focus of Attention , 2011, IEEE Journal of Selected Topics in Signal Processing.

[11]  Mikko Sams,et al.  Audio–visual speech perception is special , 2005, Cognition.

[12]  Chong-Wah Ngo,et al.  Summarizing Rushes Videos by Motion, Object, and Event Understanding , 2012, IEEE Transactions on Multimedia.

[13]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[14]  C. Spence,et al.  Audiovisual Attentional Capture ( P 409 ) Attentional Capture in Serial Audiovisual Search Tasks , 2007 .

[15]  Michael S. Gordon,et al.  Audiovisual speech from emotionally expressive and lateralized faces , 2011, Quarterly journal of experimental psychology.

[16]  A J Van Opstal,et al.  Auditory-visual interactions subserving goal-directed saccades in a complex scene. , 2002, Journal of neurophysiology.

[17]  P. König,et al.  Audio-visual integration during overt visual attention , 2008 .

[18]  P. Orero,et al.  How sound is the Pear Tree Story? Testing the effect of varying audio stimuli on visual attention distribution , 2012 .

[19]  Annett Schirmer,et al.  Unattended musical beats enhance visual processing. , 2010, Acta psychologica.

[20]  M. Tarr,et al.  The N170 occipito‐temporal component is delayed and enhanced to inverted faces but not to inverted objects: an electrophysiological account of face‐specific processes in the human brain , 2000, Neuroreport.

[21]  Jan Theeuwes,et al.  Audiovisual semantic interference and attention: evidence from the attentional blink paradigm. , 2010, Acta psychologica.

[22]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[23]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[24]  Jan Theeuwes,et al.  Pip and pop: nonspatial auditory signals improve spatial visual search. , 2008, Journal of experimental psychology. Human perception and performance.

[25]  N. Lavie,et al.  Changing Faces: A Detection Advantage in the Flicker Paradigm , 2001, Psychological science.

[26]  Douglas P. Munoz,et al.  Auditory-visual interactions subserving primate gaze orienting , 2004 .

[27]  J. Henderson,et al.  Do the eyes really have it? Dynamic allocation of attention when viewing moving faces. , 2012, Journal of vision.

[28]  Q. Summerfield Some preliminaries to a comprehensive account of audio-visual speech perception. , 1987 .

[29]  Maria E. Niessen,et al.  Disambiguating Sounds through Context , 2008, 2008 IEEE International Conference on Semantic Computing.

[30]  Denis Pellerin,et al.  How different kinds of sound in videos can influence gaze , 2012, 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services.

[31]  J. Vroomen,et al.  Perception of intersensory synchrony in audiovisual speech: Not that special , 2011, Cognition.

[32]  Jeesun Kim,et al.  Amodal processing of visual speech as revealed by priming , 2004, Cognition.

[33]  D. Whitaker,et al.  Sensory uncertainty governs the extent of audio-visual interaction , 2004, Vision Research.

[34]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[35]  de Gelder Sound Enhances Visual Perception: Cross-Modal Effects of Auditory Organization on Vision , 2001 .

[36]  Souta Hidaka,et al.  Sound can prolong the visible persistence of moving visual objects , 2010, Vision Research.

[37]  Peter König,et al.  Integrating audiovisual information for the control of overt attention. , 2007, Journal of vision.

[38]  L. Rosenblum,et al.  The McGurk effect in infants , 1997 .

[39]  Kenneth Hugdahl,et al.  Attention-related modulation of auditory-cortex responses to speech sounds during dichotic listening , 2012, Brain Research.

[40]  K. Saberi,et al.  Auditory psychomotor coordination and visual search performance , 1990, Perception & psychophysics.

[41]  J. Driver,et al.  Audiovisual links in exogenous covert spatial orienting , 1997, Perception & psychophysics.

[42]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[43]  L. Itti,et al.  Visual causes versus correlates of attentional selection in dynamic scenes , 2006, Vision Research.

[44]  Denis Pellerin,et al.  Sound effect on visual gaze when looking at videos , 2011, 2011 19th European Signal Processing Conference.

[45]  Eugene S. Edgington,et al.  Randomization Tests , 2011, International Encyclopedia of Statistical Science.

[46]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.