A novel method for image classification based on bag of visual words

We apply the salient region extraction to BOW model, it can produce more representive visual words and avoid the disturbance of complex background information.The visual words topological structure is able to integrate into global spatial information of visual words to improve BOW model.Delaunay triangulation method is employed to integrate into local spatial information of visual words to improve traditional BOW model. Bag of words (BOW) model is widely applied in image classification. The traditional BOW neglects the spatial information and object shape information, which fails to distinguish between image features absolutely. In this paper, we combine the salient region with visual words topological structure. It not only can produce more representative visual words, but also can avoid the disturbance of complex background efficiently. Firstly, the salient regions are extracted and the BOW model is built on salient regions. Secondly, in order to describe the characteristics of the image more accurately and to resist the influence of background information, the visual words topological structure and Delaunay triangulation method are employed, which is able to integrate into the global and local information. The performance of the proposed algorithm is tested on several datasets, and compared with other models. The experiment results seem to demonstrate that the proposed method provide a higher classification accuracy.

[1]  Takeo Kanade,et al.  Object detection using 2D spatial ordering constraints , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Yunhui Liu,et al.  Effective corner matching based on Delaunay triangulation , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[7]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[8]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[9]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[11]  Tsuhan Chen,et al.  Efficient Kernels for identifying unbounded-order spatial features , 2009, CVPR.

[12]  Qiu Zheng-ding A Novel Visual Words Definition Algorithm of Image Patch Based on Contextual Semantic Information , 2010 .

[13]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[14]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Tao Mei,et al.  Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Arnold W. M. Smeulders,et al.  The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[18]  Xiaochun Cao,et al.  Good match exploration using triangle constraint , 2012, Pattern Recognit. Lett..

[19]  Narendra Ahuja,et al.  Region correspondence by global configuration matching and progressive Delaunay triangulation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[20]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[22]  Qionghai Dai,et al.  Visual Words Assignment Via Information-Theoretic Manifold Embedding , 2014, IEEE Transactions on Cybernetics.

[23]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[24]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Qionghai Dai,et al.  Differences Help Recognition: A Probabilistic Interpretation , 2014, PloS one.

[26]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[27]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.