A combined model for scan path in pedestrian searching

Target searching, i.e. fast locating target objects in images or videos, has attracted much attention in computer vision. A comprehensive understanding of factors influencing human visual searching is essential to design target searching algorithms for computer vision systems. In this paper, we propose a combined model to generate scan paths for computer vision to follow to search targets in images. The model explores and integrates three factors influencing human vision searching, top-down target information, spatial context and bottom-up visual saliency, respectively. The effectiveness of the combined model is evaluated by comparing the generated scan paths with human vision fixation sequences to locate targets in the same images. The evaluation strategy is also used to learn the optimal weighting coefficients of the factors through linear search. In the meanwhile, the performances of every single one of the factors and their arbitrary combinations are examined. Through plenty of experiments, we prove that the top-down target information is the most important factor influencing the accuracy of target searching. The effects from the bottom-up visual saliency are limited. Any combinations of the three factors have better performances than each single component factor. The scan paths obtained by the proposed model are optimal, since they are most similar to the human vision fixation sequences.

[1]  Miguel P Eckstein,et al.  Attentional Cues in Real Scenes, Saccadic Targeting, and Bayesian Priors , 2005, Psychological science.

[2]  L. Itti,et al.  Visual causes versus correlates of attentional selection in dynamic scenes , 2006, Vision Research.

[3]  Xilin Chen,et al.  Evaluation of the Impetuses of Scan Path in Real Scene Searching , 2010, ACCV Workshops.

[4]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[5]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[6]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  I. Biederman,et al.  Scene perception: Detecting and judging objects undergoing relational violations , 1982, Cognitive Psychology.

[9]  J. Antes The time course of picture viewing. , 1974, Journal of experimental psychology.

[10]  Pushmeet Kohli,et al.  User-Centric Learning and Evaluation of Interactive Segmentation Systems , 2012, International Journal of Computer Vision.

[11]  Matthew H Tong,et al.  SUN: Top-down saliency using natural statistics , 2009, Visual cognition.

[12]  C. Koch,et al.  Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. , 2008, Journal of vision.

[13]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[14]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[15]  Yu Fu,et al.  Visual saliency detection by spatially weighted dissimilarity , 2011, CVPR 2011.

[16]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[17]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[18]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[19]  George L. Malcolm,et al.  Combining top-down processes to guide eye movements during real-world scene search. , 2010, Journal of vision.

[20]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[21]  Gregory J. Zelinsky,et al.  Scene context guides eye movements during visual search , 2006, Vision Research.

[22]  Antonio Torralba,et al.  Describing Visual Scenes Using Transformed Objects and Parts , 2008, International Journal of Computer Vision.