Object recognition based on the Region of Interest and optimal Bag of Words model

Bag of Words (BoW) model has been widely used in conventional object recognition tasks. Different from the existing methods, this paper proposed a method for object recognition based on Region of Interest (ROI) and Optimal Bag of Words model. It includes the following steps: (1) ROI extraction in combination with the Shi-Tomasi corner and Itti saliency map; (2) The SIFT feature descriptors are detected and described on images of interest; (3) A visual codebook is generated through utilizing the Gaussian mixture models, which relies on the clustering results of k-means++; (4) The similarities between each visual word and corresponding local feature are computed by posterior pseudo probabilities discriminative to construct a visual word soft histogram for image representation; (5) The Support vector machine (SVM) is used to perform image classification and recognition. The experiments are performed on the MSRC 21-class database. The results show that the proposed method can be more accurately recognize images.

[1]  Ali Borji,et al.  Exploiting local and global patch rarities for saliency detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Rudolf Hauke,et al.  Filtering adult image content with topic models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[3]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Gert R. G. Lanckriet,et al.  Contextual Object Localization With Multiple Kernel Nearest Neighbor , 2011, IEEE Transactions on Image Processing.

[5]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Bill Triggs,et al.  Multilevel Image Coding with Hyperfeatures , 2008, International Journal of Computer Vision.

[7]  Wilfried Enkelmann,et al.  A video-based lane keeping assistant , 2000, Proceedings of the IEEE Intelligent Vehicles Symposium 2000 (Cat. No.00TH8511).

[8]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[11]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[13]  Yunde Jia,et al.  Visual Word Soft-Histogram for Image Representation: Visual Word Soft-Histogram for Image Representation , 2012 .

[14]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Hermann Ney,et al.  Bag-of-visual-words models for adult image classification and filtering , 2008, 2008 19th International Conference on Pattern Recognition.

[16]  Chao Zhu,et al.  Visual object recognition using DAISY descriptor , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[17]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[18]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[19]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Wen Gao,et al.  Effective and efficient object-based image retrieval using visual phrases , 2006, MM '06.

[21]  Meng Wang,et al.  Movie2Comics: Towards a Lively Video Content Presentation , 2012, IEEE Transactions on Multimedia.

[22]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[23]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[25]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Florent Perronnin,et al.  Universal and Adapted Vocabularies for Generic Visual Categorization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[29]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[30]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[31]  Xuelong Li,et al.  Biologically Inspired Features for Scene Classification in Video Surveillance , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  汪萌,et al.  Image Annotation By Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014 .

[33]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Wang Yan Visual Word Soft-Histogram for Image Representation , 2012 .

[35]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  Weisheng Li,et al.  Object recognition based on the region of interest and optical bag of words model , 2013, ICIMCS '13.