Audio-visual attention: Eye-tracking dataset and analysis toolbox

Although many visual attention models have been proposed, very few saliency models investigated the impact of audio information. To develop audio-visual attention models, researchers need to have a ground truth of eye movements recorded while exploring complex natural scenes in different audio conditions. They also need tools to compare eye movements and gaze patterns between these different audio conditions. This paper describes a toolbox that answer these needs by proposing a new eye-tracking dataset and its associated analysis ToolBox that contains common metrics to analysis eye movements. Our eye-tracking dataset contains the eye positions gathered during four eye-tracking experiments. A total of 176 observers were recorded while exploring 148 videos (mean duration = 22 s) split between different audio conditions (with or without sound) and visual categories (moving objects, landscapes and faces). Our ToolBox allows to visualize the temporal evolution of different metrics computed from the recorded eye positions. Both dataset and ToolBox are freely available to help design and assess visual saliency models for audiovisual dynamic stimuli.

[1]  Bi Liu,et al.  A Normalized Levenshtein Distance Metric , 2007, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Antoine Coutrot,et al.  Influence of soundtrack on eye movements during video exploration , 2012 .

[3]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[4]  A. Coutrot,et al.  An efficient audiovisual saliency model to predict eye positions when looking at conversations , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[5]  A. King,et al.  The superior colliculus , 2004, Current Biology.

[6]  Antoine Coutrot,et al.  Video viewing: do auditory salient events capture visual attention? , 2013, annals of telecommunications - annales des télécommunications.

[7]  Antoine Coutrot,et al.  An audiovisual attention model for natural conversation scenes , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[8]  Nicolas Riche,et al.  A study of parameters affecting visual saliency assessment , 2013, ArXiv.

[9]  A. Coutrot,et al.  How saliency, faces, and sound influence gaze in dynamic social scenes. , 2014, Journal of vision.

[10]  Petros Maragos,et al.  Video event detection and summarization using audio, visual and text saliency , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Nathalie Guyader,et al.  Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos , 2009, International Journal of Computer Vision.

[12]  Antoine Coutrot,et al.  Scanpath modeling and classification with hidden Markov models , 2017, Behavior Research Methods.

[13]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[14]  Alan Kingstone,et al.  Recurrence quantification analysis of eye movements , 2013, Behavior Research Methods.

[15]  Denis Pellerin,et al.  Sound effect on visual gaze when looking at videos , 2011, 2011 19th European Signal Processing Conference.

[16]  Antoine Coutrot,et al.  Toward the introduction of auditory information in dynamic visual attention models , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[17]  Antoine Coutrot Influence du son lors de l’exploration de scènes naturelles dynamiques , 2014 .