UNDERSTANDING AND PREDICTING HUMAN VISUAL ATTENTION

An understanding of how the human visual system works is essential for many applications in computer vision, computer graphics, computational photography, psychology, sociology, and human-computer-interaction. To provide the research community with access to easier, cheaper eye tracking data for developing and evaluating computational models for human visual attention, this thesis introduces a webcam-based gaze tracking system that supports large-scale, crowdsourced eye tracking deployed on a crowd-sourcing platform. By using this tool, we also provide a benchmark data set to quantitatively compare existing and future models for saliency prediction. To explore where people look while performing complicated tasks in an interactive environment, we introduce a method to synthesize user interface layouts, present a computational model to predict users’ spatio-temporal visual attention for graphical user interfaces, and show that our model outperforms existing methods. In addition, we explore how visual stimuli affect brain signals extracted by fMRI. Our tool for crowd-sourced eye tracking, a large data set for scene image saliency, models for user interface layouts synthesis and visual attention prediction and study for visual stimuli driven change of brain connectivity should be useful resources for future researchers to create more powerful computational models for human visual attention.

[1]  Jean-Jacques Fuchs,et al.  On sparse representations in arbitrary redundant bases , 2004, IEEE Transactions on Information Theory.

[2]  Raimund Dachselt,et al.  Look & touch: gaze-supported target acquisition , 2012, CHI.

[3]  Xavier Giró-i-Nieto,et al.  End-to-end Convolutional Network for Saliency Prediction , 2015, ArXiv.

[4]  Ryen W. White,et al.  No clicks, no problem: using cursor movements to understand and improve search , 2011, CHI.

[5]  Takahiro Okabe,et al.  Head pose-free appearance-based gaze sensing via eye image synthesis , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[6]  Jing Li,et al.  Learning brain connectivity of Alzheimer's disease by sparse inverse covariance estimation , 2010, NeuroImage.

[7]  Ali Borji,et al.  Analysis of Scores, Datasets, and Models in Visual Saliency Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Carlos Hitoshi Morimoto,et al.  Eye gaze tracking techniques for interactive applications , 2005, Comput. Vis. Image Underst..

[9]  Gert Kootstra,et al.  Predicting Eye Fixations on Complex Visual Stimuli Using Local Symmetry , 2011, Cognitive Computation.

[10]  Takahiro Okabe,et al.  Incorporating visual field characteristics into a saliency map , 2012, ETRA '12.

[11]  Trevor J. Hastie,et al.  Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso , 2011, J. Mach. Learn. Res..

[12]  Shiqian Ma,et al.  Sparse Inverse Covariance Selection via Alternating Linearization Methods , 2010, NIPS.

[13]  Hans-Werner Gellersen,et al.  Pursuits: spontaneous interaction with displays based on smooth pursuit eye movement and moving targets , 2013, UbiComp.

[14]  Moshe Eizenman,et al.  General theory of remote gaze estimation using the pupil center and corneal reflections , 2006, IEEE Transactions on Biomedical Engineering.

[15]  Robert J. K. Jacob,et al.  What you look at is what you get: eye movement-based interaction techniques , 1990, CHI '90.

[16]  John R. Anderson,et al.  What can a mouse cursor tell us more?: correlation of eye/mouse movements on web browsing , 2001, CHI Extended Abstracts.

[17]  Víctor Leborán,et al.  On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. , 2012, Journal of vision.

[18]  Lihi Zelnik-Manor,et al.  Crowdsourcing Gaze Data Collection , 2012, ArXiv.

[20]  John R. Anderson,et al.  Intelligent gaze-added interfaces , 2000, CHI.

[21]  Alexander J. Smola,et al.  Measurement and modeling of eye-mouse behavior in the presence of nonlinear page layouts , 2013, WWW.

[22]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[23]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[24]  O. Sporns,et al.  Organization, development and function of complex brain networks , 2004, Trends in Cognitive Sciences.

[25]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[26]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[27]  Elizabeth F. Churchill,et al.  Mouse tracking: measuring and predicting users' experience of web-based content , 2012, CHI.

[28]  Ali Borji,et al.  Computational Modeling of Top-down Visual Attention in Interactive Environments , 2011, BMVC.

[29]  John Paulin Hansen,et al.  Gaming with gaze and losing with a smile , 2012, ETRA '12.

[30]  Dimitris Samaras,et al.  Variable Selection for Gaussian Graphical Models , 2012, AISTATS.

[31]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[32]  Yanxia Zhang,et al.  SideWays: a gaze interface for spontaneous interaction with situated displays , 2013, CHI.

[33]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[34]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, ICCV.

[35]  Heinrich H. Bülthoff,et al.  Eye and pointer coordination in search and selection tasks , 2010, ETRA '10.

[36]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[37]  Takahiro Okabe,et al.  Inferring human gaze from appearance via adaptive linear regression , 2011, 2011 International Conference on Computer Vision.

[38]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[39]  Yoichi Sato,et al.  Appearance-Based Gaze Estimation With Online Calibration From Mouse Operations , 2015, IEEE Transactions on Human-Machine Systems.

[40]  Peter Wonka,et al.  Fused Multiple Graphical Lasso , 2012, SIAM J. Optim..

[41]  Cristina Conati,et al.  Individual user characteristics and information visualization: connecting the dots through eye tracking , 2013, CHI.

[42]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[43]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[44]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[45]  Anna L. Cox,et al.  The Role of Mouse Movements in Interactive Search , 2006 .

[46]  J. Hiriart-Urruty,et al.  Fundamentals of Convex Analysis , 2004 .

[47]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[48]  Andrew Blake,et al.  Sparse and Semi-supervised Visual Mapping with the S^3GP , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  John K. Tsotsos,et al.  Spatiotemporal Saliency: Towards a Hierarchical Representation of Visual Saliency , 2009, WAPCV.

[50]  Christopher M. Masciocchi,et al.  Alternatives to Eye Tracking for Predicting Stimulus-Driven Attentional Selection Within Interfaces , 2013, Hum. Comput. Interact..

[51]  P. Sterling,et al.  How Much the Eye Tells the Brain , 2006, Current Biology.

[52]  Denis Pellerin,et al.  Video summarization using a visual attention model , 2007, 2007 15th European Signal Processing Conference.

[53]  Klaas E. Stephan,et al.  Network participation indices: characterizing component roles for information processing in neural networks , 2003, Neural Networks.

[54]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[55]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[56]  Eugene Agichtein,et al.  Exploring mouse movements for inferring query intent , 2008, SIGIR '08.

[57]  Paul P. Maglio,et al.  SUITOR: an attentive information system , 2000, IUI '00.

[58]  Mohamed Nazih Omri,et al.  Hidden Markov model for inferring user task using mouse movement , 2013, Fourth International Conference on Information and Communication Technology and Accessibility (ICTA).

[59]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[60]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[61]  Kerry Rodden,et al.  Eye-mouse coordination patterns on web search results pages , 2008, CHI Extended Abstracts.

[62]  Jean-Baptiste Poline,et al.  Brain covariance selection: better individual functional connectivity models using population prior , 2010, NIPS.

[63]  Olaf Sporns,et al.  Computational Methods for the Analysis of Brain Connectivity , 2002 .

[64]  K L Shapiro,et al.  Temporary suppression of visual processing in an RSVP task: an attentional blink? . , 1992, Journal of experimental psychology. Human perception and performance.

[65]  C. Chabris,et al.  Gorillas in Our Midst: Sustained Inattentional Blindness for Dynamic Events , 1999, Perception.

[66]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[67]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[68]  Ryen W. White,et al.  User see, user point: gaze and cursor alignment in web search , 2012, CHI.

[69]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[70]  Junle Wang,et al.  Quantifying the relationship between visual salience and visual importance , 2010, Electronic Imaging.

[71]  John K. Tsotsos,et al.  Attention based on information maximization , 2010 .

[72]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[73]  Victoria Johansson,et al.  Combined eyetracking and keystroke-logging methods for studying cognitive processes in text production , 2009, Behavior research methods.

[74]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Harish Katti,et al.  An Eye Fixation Database for Saliency Detection in Images , 2010, ECCV.

[76]  Cristian Sminchisescu,et al.  Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths , 2013, NIPS.

[77]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[78]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[79]  Robert J. K. Jacob,et al.  Evaluation of eye gaze interaction , 2000, CHI.

[80]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[81]  Päivi Majaranta,et al.  Twenty years of eye typing: systems and design issues , 2002, ETRA.

[82]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  Narendra Ahuja,et al.  Appearance-based eye gaze estimation , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[84]  Yoichi Sato,et al.  An Incremental Learning Method for Unconstrained Gaze Estimation , 2008, ECCV.

[85]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[86]  Albrecht Schmidt,et al.  Interacting with the Computer Using Gaze Gestures , 2007, INTERACT.

[87]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[88]  Myung Jin Chung,et al.  A novel non-intrusive eye gaze estimation using cross-ratio under large head motion , 2005, Comput. Vis. Image Underst..

[89]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[90]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[91]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[92]  Theo Gevers,et al.  Accurate eye center location and tracking using isophote curvature , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[93]  R. Tibshirani,et al.  Sparse estimation of a covariance matrix. , 2011, Biometrika.

[94]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[95]  Zhiwei Zhu,et al.  Eye and gaze tracking for interactive graphic display , 2002, SMARTGRAPH '02.

[96]  Albrecht Schmidt,et al.  Increasing the security of gaze-based cued-recall graphical passwords using saliency masks , 2012, CHI.

[97]  Ernesto Arroyo,et al.  Usability tool for analysis of web designs using mouse tracks , 2006, CHI Extended Abstracts.

[98]  M. Posner,et al.  Orienting of Attention* , 1980, The Quarterly journal of experimental psychology.

[99]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[100]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[101]  D. Heeger,et al.  Neurocinematics: The Neuroscience of Film , 2008 .

[102]  Gert Kootstra,et al.  Paying Attention to Symmetry , 2008, BMVC.

[103]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[105]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106]  Martin A. Lindquist,et al.  Dynamic connectivity regression: Determining state-related changes in brain connectivity , 2012, NeuroImage.

[107]  Leslie G. Ungerleider,et al.  Mechanisms of visual attention in the human cortex. , 2000, Annual review of neuroscience.

[108]  S. Treue Neural correlates of attention in primate visual cortex , 2001, Trends in Neurosciences.

[109]  Eugene Agichtein,et al.  Towards predicting web searcher gaze position from mouse movements , 2010, CHI Extended Abstracts.

[110]  Hao Xu,et al.  Regularized hyperalignment of multi-set fMRI data , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[111]  Shumin Zhai,et al.  Hand eye coordination patterns in target selection , 2000, ETRA.

[112]  David Salesin,et al.  Gaze-based interaction for semi-automatic photo cropping , 2006, CHI.

[113]  Gustavo Deco,et al.  Attention in natural scenes: Neurophysiological and computational bases , 2006, Neural Networks.

[114]  Hao Xu,et al.  Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries , 2011, NIPS.

[115]  Susan T. Dumais,et al.  Gaze and mouse coordination in everyday work , 2014, UbiComp Adjunct.

[116]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[117]  Esma Aïmeur,et al.  Activity recognition using eye-gaze movements and traditional interactions , 2011, Interact. Comput..

[118]  Dhaval S. Pimplaskar,et al.  Real Time Eye Blinking Detection and Tracking Using Opencv , 2013 .

[119]  Dim P. Papadopoulos,et al.  Explorer Training Object Class Detectors from Eye Tracking Data , 2014 .

[120]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[121]  Victoria Johansson,et al.  Looking at the keyboard or the monitor: relationship with text production processes , 2010 .

[122]  Scott T. Grafton,et al.  Dynamic reconfiguration of human brain networks during learning , 2010, Proceedings of the National Academy of Sciences.

[123]  G. Prasad EYE TRACKING AND EYE-BASED HUMAN – COMPUTER INTERACTION , 2016 .

[124]  Joseph H. Goldberg,et al.  Identifying fixations and saccades in eye-tracking protocols , 2000, ETRA.

[125]  B. Scholl Objects and attention: the state of the art , 2001, Cognition.

[126]  Tanveer Syeda-Mahmood,et al.  On Learning Video Browsing Behavior from User Interactions , 2002, WWW 2002.

[127]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[128]  Jian-Gang Wang,et al.  Eye gaze estimation from a single image of one eye , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[129]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[130]  Cedric E. Ginestet,et al.  Statistical parametric network analysis of functional connectivity dynamics during a working memory task , 2011, NeuroImage.

[131]  Stephen Gould,et al.  Projected Subgradient Methods for Learning Sparse Gaussians , 2008, UAI.

[132]  Christopher M. Masciocchi,et al.  A Saliency Model Predicts Fixations in Web Interfaces , 2010 .

[133]  Hans-Werner Gellersen,et al.  Cross-device gaze-supported point-to-point content transfer , 2014, ETRA.

[134]  Qiang Ji,et al.  In the Eye of the Beholder: A Survey of Models for Eyes and Gaze , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[135]  Cristian Sminchisescu,et al.  Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[136]  D. Simons,et al.  Failure to detect changes to attended objects in motion pictures , 1997 .

[137]  Qi Zhao,et al.  Webpage Saliency , 2014, ECCV.

[138]  Albrecht Schmidt,et al.  Knowing the User's Every Move – User Activity Tracking for Website Usability Evaluation and Implicit Interaction , 2006 .

[139]  Takahiro Okabe,et al.  A Head Pose-free Approach for Appearance-based Gaze Estimation , 2011, BMVC.

[140]  Jean-Jacques Fuchs,et al.  Recovery of exact sparse representations in the presence of bounded noise , 2005, IEEE Transactions on Information Theory.

[141]  Cristina Conati,et al.  User-adaptive information visualization: using eye gaze data to infer visualization tasks and user cognitive abilities , 2013, IUI '13.

[142]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.