Video viewing: do auditory salient events capture visual attention?

We assess whether salient auditory events contained in soundtracks modify eye movements when exploring videos. In a previous study, we found that, on average, nonspatial sound contained in video soundtracks impacts on eye movements. This result indicates that sound could play a leading part in visual attention models to predict eye movements. In this research, we go further and test whether the effect of sound on eye movements is stronger just after salient auditory events. To automatically spot salient auditory events, we used two auditory saliency models: the discrete energy separation algorithm and the energy model. Both models provide a saliency time curve, based on the fusion of several elementary audio features. The most salient auditory events were extracted by thresholding these curves. We examined some eye movement parameters just after these events rather than on all the video frames. We showed that the effect of sound on eye movements (variability between eye positions, saccade amplitude, and fixation duration) was not stronger after salient auditory events than on average over entire videos. Thus, we suggest that sound could impact on visual exploration not only after salient events but in a more global way.

[1]  P Gouras,et al.  The effects of light‐adaptation on rod and cone receptive field organization of monkey ganglion cells , 1967, The Journal of physiology.

[2]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[3]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[4]  H. Teager Some observations on oral air flow during phonation , 1980 .

[5]  B. Stein,et al.  Spatial factors determine the activity of multisensory neurons in cat superior colliculus , 1986, Brain Research.

[6]  B. Stein,et al.  Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors , 1987, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[7]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[8]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  R. Fay,et al.  The Mammalian auditory pathway : neurophysiology , 1992 .

[10]  N. Kraus,et al.  Electrophysiology of the Human Auditory System , 1992 .

[11]  Petros Maragos,et al.  AM-FM energy detection and separation in noise using multiband energy operators , 1993, IEEE Trans. Signal Process..

[12]  B. Stein,et al.  The Merging of the Senses , 1993 .

[13]  D P Munoz,et al.  The Influence of Auditory and Visual Distractors on Human Orienting Gaze Shifts , 1996, The Journal of Neuroscience.

[14]  J. Vroomen,et al.  Sound enhances visual perception: cross-modal effects of auditory organization on vision. , 2000, Journal of experimental psychology. Human perception and performance.

[15]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[16]  Alan Chalmers,et al.  Detail to Attention: Exploiting Visual Tasks for Selective Rendering , 2003, Rendering Techniques.

[17]  N. Troje,et al.  Audiovisual phenomenal causality , 2003, Perception & psychophysics.

[18]  Katsumi Aoki,et al.  Recent development of flow visualization , 2004, J. Vis..

[19]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[20]  Michael T. Lippert,et al.  Mechanisms for Allocating Auditory Attention: An Auditory Saliency Map , 2005, Current Biology.

[21]  R. Baddeley,et al.  The long and the short of it: Spatial statistics at fixation vary with saccade amplitude and task , 2006, Vision Research.

[22]  Petros Maragos,et al.  Multiband Modulation Energy Tracking for Noisy Speech Detection , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Shrikanth S. Narayanan,et al.  A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech , 2007, INTERSPEECH.

[24]  S. David,et al.  Auditory attention : focusing the searchlight on sound , 2007 .

[25]  Markus Huff,et al.  Changing Viewpoints during Dynamic Events , 2007, Perception.

[26]  Peter König,et al.  Integrating audiovisual information for the control of overt attention. , 2007, Journal of vision.

[27]  Masahiro Takei,et al.  Human resource development and visualization , 2009, J. Vis..

[28]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[29]  G. Recanzone Interactions of auditory and visual stimuli in space and time , 2009, Hearing Research.

[30]  Nathalie Guyader,et al.  Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos , 2009, International Journal of Computer Vision.

[31]  Petros Maragos,et al.  Video event detection and summarization using audio, visual and text saliency , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Youngmoo E. Kim,et al.  Exploring automatic music annotation with "acoustically-objective" tags , 2010, MIR '10.

[33]  Laurent Itti,et al.  Visual attention guided bit allocation in video compression , 2011, Image Vis. Comput..

[34]  Antoine Coutrot,et al.  Influence of soundtrack on eye movements during video exploration , 2012 .

[35]  T. Foulsham,et al.  Comparing scanpaths during scene encoding and recognition : A multi-dimensional approach , 2012 .

[36]  Nathalie Guyader,et al.  When viewing natural scenes, do abnormal colors impact on spatial or temporal parameters of eye movements? , 2012, Journal of vision.

[37]  Daniel T. Levin,et al.  A Window on Reality , 2012 .