Modeling Objects with Local Descriptors of Biologically Motivated Selective Attention

Recent work in visual retrieval shows that bag-of-features (BoF) has appeared promising for object recognition and categorization. Local descriptors such as SIFT have shown impressive results on objects. The main idea of BoF is to depict each image as an orderless collection of local keypoint features. However, not all the local keypoint features are important for retrieving objects and rather, the user is often interested in saliency regions of object classes. Therefore, we propose a new method for modeling attention objects with local descriptors. The proposed model in conjunction with a biologically motivated selective attention model can extract the salient regions of each image. In order to model attention objects, we propose a new attention-based SIFT algorithm using scale contract information and local keypoint features to reflect more exact saliency in object classes. Computer experimental results on both Caltech 101 object category datasets and TRECVID2007 datasets shows that the proposed model can generate competitive performance.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[3]  Robert B. Fisher,et al.  Object-based visual attention for computer vision , 2003, Artif. Intell..

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Minho Lee,et al.  A Region of Interest Based Image Segmentation Method using a Biologically Motivated Selective Attention Model , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[6]  Hung-Khoon Tan,et al.  Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and search , 2007, TRECVID.

[7]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[9]  Dong Wang,et al.  THU and ICRC at TRECVID 2007 , 2007, TRECVID.

[10]  S. Lazebnik,et al.  Local Features and Kernels for Classification of Texture and Object Categories: An In-Depth Study , 2005 .

[11]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[12]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[13]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.