Learning a time-dependent master saliency map from eye-tracking data in videos

To predict the most salient regions of complex natural scenes, saliency models commonly compute several feature maps (contrast, orientation, motion...) and linearly combine them into a master saliency map. Since feature maps have different spatial distribution and amplitude dynamic ranges, determining their contributions to overall saliency remains an open problem. Most state-of-the-art models do not take time into account and give feature maps constant weights across the stimulus duration. However, visual exploration is a highly dynamic process shaped by many time-dependent factors. For instance, some systematic viewing patterns such as the center bias are known to dramatically vary across the time course of the exploration. In this paper, we use maximum likelihood and shrinkage methods to dynamically and jointly learn feature map and systematic viewing pattern weights directly from eye-tracking data recorded on videos. We show that these weights systematically vary as a function of time, and heavily depend upon the semantic visual category of the videos being processed. Our fusion method allows taking these variations into account, and outperforms other state-of-the-art fusion schemes using constant weights over time. The code, videos and eye-tracking data we used for this study are available online: this http URL

[1]  Olivier Le Meur,et al.  A Time-Dependent Saliency Model Combining Center and Depth Biases for 2D and 3D Viewing Conditions , 2012, Cognitive Computation.

[2]  Zhi Liu,et al.  Saccadic model of eye movements for free-viewing condition , 2015, Vision Research.

[3]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[4]  Kaitlin L. Brunick,et al.  Quicker, faster, darker: Changes in Hollywood film over 75 years , 2011, i-Perception.

[5]  Aykut Erdem,et al.  Visual saliency estimation by integrating features using multiple kernel learning , 2013, ArXiv.

[6]  Naila Murray,et al.  Saliency estimation using a non-parametric low-level vision model , 2011, CVPR 2011.

[7]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[8]  Danilo De Rossi,et al.  Designing and Evaluating a Social Gaze-Control System for a Humanoid Robot , 2014, IEEE Transactions on Human-Machine Systems.

[9]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Rasmus Larsen,et al.  SpaSM: A MATLAB Toolbox for Sparse Statistical Modeling , 2018 .

[11]  J. Henderson,et al.  Do the eyes really have it? Dynamic allocation of attention when viewing moving faces. , 2012, Journal of vision.

[12]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[14]  N Parikh,et al.  Saliency-based image processing for retinal prostheses , 2010, Journal of neural engineering.

[15]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[16]  Thierry Pun,et al.  Integration of bottom-up and top-down cues for visual attention using non-linear relaxation , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Changsheng Xu,et al.  Video based 3D reconstruction using spatio-temporal attention analysis , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[18]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Christof Koch,et al.  Learning visual saliency by combining feature maps in a nonlinear manner using AdaBoost. , 2012, Journal of vision.

[20]  Marcus Nyström,et al.  Semantic override of low-level features in image viewing - both initially and overall , 2008 .

[21]  C. Koch,et al.  Faces and text attract gaze independent of the task: Experimental data and computer model. , 2009, Journal of vision.

[22]  Matthew H Tong,et al.  of the Annual Meeting of the Cognitive Science Society Title SUNDAy : Saliency Using Natural Statistics for Dynamic Analysis of Scenes Permalink , 2009 .

[23]  Thomas Couronné,et al.  A statistical mixture method to reveal bottom-up and top-down factors guiding the eye-movements , 2010 .

[24]  Lihi Zelnik-Manor,et al.  Learning Video Saliency from Human Gaze Using Candidate Selection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[26]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[27]  Benjamin W. Tatler,et al.  Systematic tendencies in scene viewing , 2008 .

[28]  Simone Frintrop,et al.  Goal-Directed Search with a Top-Down Modulated Computational Attention System , 2005, DAGM-Symposium.

[29]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[30]  Bo Han,et al.  High Speed Visual Saliency Computation on GPU , 2007, 2007 IEEE International Conference on Image Processing.

[31]  A. Tikhonov On the stability of inverse problems , 1943 .

[32]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[33]  H. Nothdurft Salience from feature contrast: additivity across dimensions , 2000, Vision Research.

[34]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[35]  Liming Zhang,et al.  New strategy for image and video quality assessment , 2010, J. Electronic Imaging.

[36]  L. Itti,et al.  Visual causes versus correlates of attentional selection in dynamic scenes , 2006, Vision Research.

[37]  Ali Borji,et al.  Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Zheru Chi,et al.  Refining a region based attention model using eye tracking data , 2010, 2010 IEEE International Conference on Image Processing.

[39]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[40]  Nicola C. Anderson,et al.  Curious eyes: Individual differences in personality predict eye movement behavior in scene-viewing , 2012, Cognition.

[41]  Siwei Lyu,et al.  Fused methods for visual saliency estimation , 2015, Electronic Imaging.

[42]  Nathalie Guyader,et al.  A Functional and Statistical Bottom-Up Saliency Model to Reveal the Relative Contributions of Low-Level Visual Guiding Factors , 2010, Cognitive Computation.

[43]  Homer H. Chen,et al.  Learning-Based Prediction of Visual Attention for Video Signals , 2011, IEEE Transactions on Image Processing.

[44]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[45]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[46]  James Zijun Wang,et al.  Markov chain based computational visual attention model that learns from eye tracking data , 2014, Pattern Recognit. Lett..

[47]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[48]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[49]  B. Tatler,et al.  The prominence of behavioural biases in eye guidance , 2009 .

[50]  Byung-Woo Hong,et al.  A Topographic Representation for Mammogram Segmentation , 2003, MICCAI.

[51]  Xiaolin Hu,et al.  Feature Selection in Supervised Saliency Prediction , 2015, IEEE Transactions on Cybernetics.

[52]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[53]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[54]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[57]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[58]  Eric Vatikiotis-Bateson,et al.  Audiovisual Speech Processing: Preface , 2012 .

[59]  Antoine Coutrot,et al.  Toward the introduction of auditory information in dynamic visual attention models , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[60]  Mario Bertero,et al.  The Stability of Inverse Problems , 1980 .

[61]  John M. Henderson,et al.  Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion , 2011, Cognitive Computation.

[62]  Zhi Liu,et al.  Saliency Aggregation: Does Unity Make Strength? , 2014, ACCV.

[63]  Antoine Coutrot,et al.  Face exploration dynamics differentiate men and women. , 2016, Journal of vision.

[64]  Alexandre Bernardino,et al.  Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub , 2008, 2008 IEEE International Conference on Robotics and Automation.

[65]  Wei-Song Lin,et al.  A computational visual saliency model based on statistics and machine learning. , 2014, Journal of vision.

[66]  Yiannis Aloimonos,et al.  Active Segmentation , 2009, Int. J. Humanoid Robotics.

[67]  Javier R. Movellan,et al.  Optimal scanning for faster object detection , 2009, CVPR.

[68]  Ali Borji,et al.  Augmented saliency model using automatic 3D head pose detection and learned gaze following in natural scenes , 2015, Vision Research.

[69]  Nathalie Guyader,et al.  Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos , 2009, International Journal of Computer Vision.

[70]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[71]  Peng Jiang,et al.  Keyframe-Based Video Summary Using Visual Attention Clues , 2010, IEEE Multim..

[72]  Q. Summerfield Some preliminaries to a comprehensive account of audio-visual speech perception. , 1987 .

[73]  Julie E. Boland,et al.  Cultural variation in eye movements during scene perception. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[74]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[75]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[76]  Giuseppe Boccignone,et al.  Nonparametric Bayesian attentive video analysis , 2008, 2008 19th International Conference on Pattern Recognition.

[77]  N. Yi,et al.  Bayesian LASSO for Quantitative Trait Loci Mapping , 2008, Genetics.

[78]  Yu Huang,et al.  Video retargeting with nonlinear spatial-temporal saliency fusion , 2010, 2010 IEEE International Conference on Image Processing.

[79]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[80]  Christel Chamaret,et al.  Spatio-temporal combination of saliency maps and eye-tracking assessment of different strategies , 2010, 2010 IEEE International Conference on Image Processing.

[81]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[82]  Laurent Itti,et al.  A Bayesian model for efficient visual search and recognition , 2010, Vision Research.

[83]  N. Vasconcelos,et al.  Biologically plausible saliency mechanisms improve feedforward object recognition , 2010, Vision Research.

[84]  Antoine Coutrot,et al.  An audiovisual attention model for natural conversation scenes , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[85]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[86]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[87]  R. Baddeley,et al.  Do we look at lights? Using mixture modelling to distinguish between low- and high-level factors in natural image viewing , 2009 .

[88]  Neil D. B. Bruce,et al.  Visual Saliency Prediction and Evaluation across Different Perceptual Tasks , 2015, PloS one.

[89]  O. Meur,et al.  Introducing context-dependent and spatially-variant viewing biases in saccadic models , 2016, Vision Research.

[90]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[91]  Pascal Bertolino Sensarea: An authoring tool to create accurate clickable videos , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[92]  Nicolas Riche,et al.  RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis , 2013, Signal Process. Image Commun..

[93]  Antoine Coutrot,et al.  Influence of soundtrack on eye movements during video exploration , 2012 .

[94]  Uri Hasson,et al.  Temporal eye movement strategies during naturalistic viewing. , 2012, Journal of vision.

[95]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[96]  Bernhard Schölkopf,et al.  A Nonparametric Approach to Bottom-Up Visual Saliency , 2006, NIPS.

[97]  Nathalie Guyader,et al.  Improving Visual Saliency by Adding ‘Face Feature Map’ and ‘Center Bias’ , 2012, Cognitive Computation.

[98]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[99]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[100]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[101]  Petros Maragos,et al.  Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention , 2013, IEEE Transactions on Multimedia.

[102]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[103]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[104]  A. Coutrot,et al.  How saliency, faces, and sound influence gaze in dynamic social scenes. , 2014, Journal of vision.