Glimpse: A Gaze-Based Measure of Temporal Salience

Temporal salience considers how visual attention varies over time. Although visual salience has been widely studied from a spatial perspective, its temporal dimension has been mostly ignored, despite arguably being of utmost importance to understand the temporal evolution of attention on dynamic contents. To address this gap, we proposed Glimpse, a novel measure to compute temporal salience based on the observer-spatio-temporal consistency of raw gaze data. The measure is conceptually simple, training free, and provides a semantically meaningful quantification of visual attention over time. As an extension, we explored scoring algorithms to estimate temporal salience from spatial salience maps predicted with existing computational models. However, these approaches generally fall short when compared with our proposed gaze-based measure. Glimpse could serve as the basis for several downstream tasks such as segmentation or summarization of videos. Glimpse’s software and data are publicly available.

[1]  B. S. Manjunath,et al.  Eye tracking assisted extraction of attentionally important objects from videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[3]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[4]  Heng Tao Shen,et al.  Beyond Frame-level CNN: Saliency-Aware 3-D CNN With LSTM for Video Action Recognition , 2017, IEEE Signal Processing Letters.

[5]  Mohan S. Kankanhalli,et al.  Static saliency vs. dynamic saliency: a comparative study , 2013, ACM Multimedia.

[6]  B. Ripley The Second-Order Analysis of Stationary Point Processes , 1976 .

[7]  Dmitriy Vatolin,et al.  Semiautomatic visual-attention modeling and its application to video compression , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[8]  Maria K. Eckstein,et al.  Beyond eye gaze: What else can eyetracking reveal about cognition and cognitive development? , 2016, Developmental Cognitive Neuroscience.

[9]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Aude Oliva,et al.  How Much Time Do You Have? Modeling Multi-Duration Saliency , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  John K. Tsotsos Is complexity theory appropriate for analyzing biological systems? , 1991, Behavioral and Brain Sciences.

[12]  L. Cooke,et al.  Is the Mouse a “ Poor Man ’ s Eye Tracker ” ? , 2006 .

[13]  Sofia Krasovskaya,et al.  Salience Models: A Computational Cognitive Neuroscience Review , 2019, Vision.

[14]  Hanqiu Sun,et al.  Video Saliency Prediction Using Spatiotemporal Residual Attentive Networks , 2020, IEEE Transactions on Image Processing.

[15]  Mario Fernando Montenegro Campos,et al.  A gaze driven fast-forward method for first-person videos , 2020, ArXiv.

[16]  Beomsu Kim,et al.  Why are Saliency Maps Noisy? Cause of and Solution to Noisy Saliency Maps , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[17]  Katarzyna Harezlak,et al.  Fusion of eye movement and mouse dynamics for reliable behavioral biometrics , 2018, Pattern Analysis and Applications.

[18]  Mark Weiser,et al.  Designing Calm Technology , 2004 .

[19]  Manoranjan Paul,et al.  A novel framework for video summarization based on smooth pursuit information from eye tracker data , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[20]  Hong Qin,et al.  A Novel Bottom-Up Saliency Detection Method for Video With Dynamic Background , 2018, IEEE Signal Processing Letters.

[21]  Esa Rahtu,et al.  Rethinking the Evaluation of Video Summaries , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Oleg V. Komogortsev,et al.  Benefits of temporal information for appearance-based gaze estimation , 2020, ETRA Short Papers.

[23]  Christopher M. Masciocchi,et al.  Alternatives to Eye Tracking for Predicting Stimulus-Driven Attentional Selection Within Interfaces , 2013, Hum. Comput. Interact..

[24]  Synchronized eye movements predict test scores in online video education , 2021, Proceedings of the National Academy of Sciences.

[25]  Yifan Peng,et al.  Studying Relationships between Human Gaze, Description, and Computer Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Andrew P. Robinson,et al.  Spatial patterns on the sagebrush steppe/Western juniper ecotone , 2007, Plant Ecology.

[27]  Yang Yi,et al.  Key frame extraction based on visual attention model , 2012, J. Vis. Commun. Image Represent..

[28]  Dmitriy Vatolin,et al.  Predicting video saliency using crowdsourced mouse-tracking data , 2019, GraphiCon'2019 Proceedings. Volume 2.

[29]  Nathalie Guyader,et al.  Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos , 2009, International Journal of Computer Vision.

[30]  Kim Marriott,et al.  A tool for tracking visual attention: The Restricted Focus Viewer , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[31]  M. Begon,et al.  Spatial distribution patterns of plague hosts: point pattern analysis of the burrows of great gerbils in Kazakhstan , 2015, Journal of biogeography.

[32]  Ivan V. Bajic,et al.  Eye-Tracking Database for a Set of Standard Video Sequences , 2012, IEEE Transactions on Image Processing.

[33]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Suzana de Siqueira Santos,et al.  A comparative study of statistical methods used to identify dependencies between gene expression signals , 2014, Briefings Bioinform..

[35]  Ziad M Hafed,et al.  How is visual salience computed in the brain? Insights from behaviour, neurobiology and modelling , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[36]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[38]  Sumit Shekhar,et al.  Are All the Frames Equally Important? , 2019, CHI Extended Abstracts.

[39]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[40]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[41]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[42]  Jieping Ye,et al.  Object Detection in 20 Years: A Survey , 2019, Proceedings of the IEEE.

[43]  Lucas Paletta,et al.  Novelty-based Spatiotemporal Saliency Detection for Prediction of Gaze in Egocentric Video , 2016, IEEE Signal Processing Letters.

[44]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[45]  Qi Zhao,et al.  Webpage Saliency , 2014, ECCV.

[46]  Matthias Bethge,et al.  Measuring the Importance of Temporal Features in Video Saliency , 2020, ECCV.

[47]  James Hays,et al.  WebGazer: Scalable Webcam Eye Tracking Using User Interactions , 2016, IJCAI.

[48]  Ali Borji,et al.  Saliency Prediction in the Deep Learning Era: Successes, Limitations, and Future Challenges , 2018, 1810.03716.

[49]  W. Einhäuser,et al.  How Well Can Saliency Models Predict Fixation Selection in Scenes Beyond Central Bias? A New Approach to Model Evaluation Using Generalized Linear Mixed Models , 2017, Front. Hum. Neurosci..

[50]  Enkelejda Kasneci,et al.  Exploiting the GBVS for Saliency aware Gaze Heatmaps , 2020, ETRA Short Papers.

[51]  Krzysztof Z Gajos,et al.  BubbleView , 2017, ACM Trans. Comput. Hum. Interact..

[52]  Kyle Min,et al.  TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Bernt Schiele,et al.  Gaze Embeddings for Zero-Shot Image Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Zoya Bylinskii,et al.  TurkEyes: A Web-Based Toolbox for Crowdsourcing Attention Data , 2020, CHI.

[55]  Katarzyna Harezlak,et al.  Using mutual distance plot and warped time distance chart to compare scan-paths of multiple observers , 2019, ETRA.

[56]  James M. Rehg,et al.  Gaze-enabled egocentric video summarization via constrained submodular maximization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).