Saliency prediction in the coherence theory of attention

Abstract In the coherence theory of attention, introduced by Rensink, O’Regan, and Clark (2000) , a coherence field is defined by a hierarchy of structures supporting the activities taking place across the different stages of visual attention. At the interface between low level and mid-level attention processing stages are the proto-objects; these are generated in parallel and collect features of the scene at specific location and time. These structures fade away if the region is no further attended by attention. We introduce a method to computationally model these structures. Our model is based experimentally on data collected in dynamic 3D environments via the Gaze Machine, a gaze measurement framework. This framework allows to record pupil motion at the required speed and projects the point of regard in the 3D space ( Pirri et al., 2011 , Pizzoli et al., 2011 ). To generate proto-objects the model is extended to vibrating circular membranes whose initial displacement is generated by the features that have been selected by classification. The energy of the vibrating membranes is used to predict saliency in visual search tasks.

[1]  J. Wolfe,et al.  The role of categorization in visual search for orientation. , 1992, Journal of experimental psychology. Human perception and performance.

[2]  Ronald A. Rensink The Dynamic Representation of Scenes , 2000 .

[3]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  Fiora Pirri,et al.  3D Saliency maps , 2011, CVPR 2011 WORKSHOPS.

[6]  Stephen Roberts,et al.  Finite element thin plate splines for surface fitting , 1997 .

[7]  L. Itti,et al.  Visual causes versus correlates of attentional selection in dynamic scenes , 2006, Vision Research.

[8]  Giulio Sandini,et al.  A Proto-object Based Visual Attention Model , 2008, WAPCV.

[9]  Marie desJardins,et al.  Data Clustering with a Relational Push-Pull Model , 2007 .

[10]  Laurent Itti,et al.  Robot steering with spectral image information , 2005, IEEE Transactions on Robotics.

[11]  Gang Kou,et al.  Feature Selection for Nonlinear Kernel Support Vector Machines , 2007 .

[12]  Heinz Hügli,et al.  Model Performance for Visual Attention in Real 3D Color Scenes , 2005, IWINAC.

[13]  U. Neisser,et al.  Selective looking: Attending to visually specified events , 1975, Cognitive Psychology.

[14]  Paul D. Fiore,et al.  Efficient Linear Solution of Exterior Orientation , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Eileen Kowler Eye movements: The past 25years , 2011, Vision Research.

[16]  A. Watson,et al.  A standard model for foveal detection of spatial contrast. , 2005, Journal of vision.

[17]  Ronald A. Rensink,et al.  TO SEE OR NOT TO SEE: The Need for Attention to Perceive Changes in Scenes , 1997 .

[18]  Yasunari Yokota,et al.  Facilitation of perceptual filling-in for spatio-temporal frequency of dynamic textures , 2005 .

[19]  L Stark,et al.  Closely spaced saccades. , 1975, Investigative ophthalmology.

[20]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[21]  J. Wolfe The Parallel Guidance of Visual Attention , 1992 .

[22]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[23]  Ronald A. Rensink,et al.  On the Failure to Detect Changes in Scenes Across Brief Interruptions , 2000 .

[24]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[25]  Manolis I. A. Lourakis,et al.  SBA: A software package for generic sparse bundle adjustment , 2009, TOMS.

[26]  John K. Tsotsos,et al.  Attention in Cognitive Systems, 5th International Workshop on Attention in Cognitive Systems, WAPCV 2008, Fira, Santorini, Greece, May 12, 2008, Revised Selected Papers , 2009, WAPCV.

[27]  Minoru Asada,et al.  Image feature generation by visio-motor map learning towards selective attention , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[28]  B. Julesz,et al.  Texton gradients: The texton theory revisited , 2004, Biological Cybernetics.

[29]  Nuno Vasconcelos,et al.  Spatiotemporal Saliency in Dynamic Scenes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  J. Duncan,et al.  Visual search and stimulus similarity. , 1989, Psychological review.

[31]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[32]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[33]  Alessandro Rudi,et al.  A general method for the point of regard estimation in 3D space , 2011, CVPR 2011.

[34]  L. Stark,et al.  The trajectories of saccadic eye movements. , 1979, Scientific American.

[35]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[36]  Garrison W. Cottrell,et al.  Visual saliency model for robot cameras , 2008, 2008 IEEE International Conference on Robotics and Automation.

[37]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[38]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[39]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[40]  P. Torr Geometric motion segmentation and model selection , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[41]  S. Yantis,et al.  Selective visual attention and perceptual coherence , 2006, Trends in Cognitive Sciences.

[42]  Ali Shokoufandeh,et al.  Landmark Selection for Vision-Based Navigation , 2006, IEEE Trans. Robotics.

[43]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[44]  Wei Zhou,et al.  An updated Time-Optimal 3rd-Order Linear Saccadic Eye Plant Model , 2009, Int. J. Neural Syst..

[45]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[46]  Fiora Pirri,et al.  From Saliency to Eye Gaze: Embodied Visual Selection for a Pan-Tilt-Based Robotic Head , 2011, ISVC.

[47]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[48]  Anne Treisman,et al.  Preattentive processing in vision , 1985, Computer Vision Graphics and Image Processing.

[49]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[50]  Fiora Pirri,et al.  Bottom-Up Gaze Shifts and Fixations Learning by Imitation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[51]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[52]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.