Combining attention and recognition for rapid scene analysis

Bottom-up visual attention allows primates to quickly select regions of an image that contain salient objects. In artificial systems, restricting the task of object recognition to these regions allows faster recognition and unsupervised learning of multiple objects in cluttered scenes. A problem is that objects superficially dissimilar to the target are given the same consideration in recognition as similar objects. Here we investigate rapid pruning of the recognition search space using the already-computed low-level features that guide attention. Itti and Koch’s bottom-up visual attention algorithm selects salient locations based on low-level features such as contrast, orientation, color, and intensity. Lowe’s SIFT recognition algorithm then extracts a signature of the attended object, for comparison with the object database. The database search is prioritized for objects which better match the low-level features used to guide attention to the current candidate for recognition. The SIFT signatures of prioritized database objects are then checked for match against the attended candidate. By comparing performance of Lowe’s recognition algorithm and Itti and Koch’s bottom-up attention model with or without search space pruning, we demonstrate that our pruning approach improves the speed of object recognition in complex natural scenes.

[1]  Robert B. Fisher,et al.  Object-based visual attention for computer vision , 2003, Artif. Intell..

[2]  Albert Ali Salah,et al.  A Selective Attention-Based Method for Visual Pattern Recognition with Application to Handwritten Digit Recognition and Face Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[4]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[5]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[6]  Erich Rome Simulating Visual Attention for Object Recognition , 2004 .

[7]  Pietro Perona,et al.  On the usefulness of attention for object recognition , 2004 .

[8]  Laurent Itti,et al.  Neuromorphic algorithms for computer vision and attention , 2001, SPIE Optics + Photonics.

[9]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[10]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  David G. Lowe,et al.  Towards a Computational Model for Object Recognition in IT Cortex , 2000, Biologically Motivated Computer Vision.

[13]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, CVPR 2004.

[14]  U. Neisser VISUAL SEARCH. , 1964, Scientific American.

[15]  Franz Kummert,et al.  Dynamic search-space pruning for time-constrained speech recognition , 2002, INTERSPEECH.

[16]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.