Large visual words for large scale image classification

Recently, using large visual vocabulary or codebooks to quantize and partition the set of local feature descriptors into large set of disjoint subsets termed visual words (or large visual words) has become an important research topic in solving many computer vision problems including near duplicate image retrieval, object retrieval, etc. Generally, large visual words means a heavy burden on the cost of time and memory space for both the construction of large vocabulary and the searching process, especially for large scale applications. In this paper, we present an efficient generation approach of large visual words with a very compact vocabulary, namely two dictionaries learned with sparse non-negative matrix factorization (NMF). After piecewise sparse decomposition of features with two learned dictionaries, we map a pair of indices of the dictionary's bases corresponding to the maximum elements of the two sparse codes to a large set of visual words upon the assumption that data with similar properties will share the same base with the largest sparse coefficient. With the help of an inverted file structure built through the large visual words, K-nearest neighbors (KNN) can be efficiently retrieved. Therefore, we can classify images very efficiently with the incorporation of our fast KNN search based on large visual words into SVM-KNN method. Experiments on the public Oxford dataset, and ACM Multimedia 2013 Yahoo! image classification challenge dataset show that our approach is both effective and efficient.

[1]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[2]  Yannis Avrithis,et al.  Approximate Gaussian Mixtures for Large Scale Vocabularies , 2012, ECCV.

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ramesh Jain,et al.  Storage and Retrieval for Image and Video Databases III , 1995 .

[6]  Jiri Matas,et al.  Learning Vocabularies over a Fine Quantization , 2013, International Journal of Computer Vision.

[7]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[8]  Sheng Tang,et al.  Sparse Ensemble Learning for Concept Detection , 2012, IEEE Transactions on Multimedia.

[9]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[10]  Xian-Sheng Hua,et al.  Large-scale robust visual codebook construction , 2010, ACM Multimedia.

[11]  Sheng Tang,et al.  An efficient concept detection system via sparse ensemble learning , 2015, Neurocomputing.

[12]  Changchang Wu,et al.  SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT) , 2007 .

[13]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[15]  Lifeng Sun,et al.  Auto-cut for web images , 2009, MM '09.

[16]  Sheng Tang,et al.  MovieBase: a movie database for event detection and behavioral analysis , 2009, WSMC '09.

[17]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[18]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[21]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[23]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[24]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[25]  Yiannis Kompatsiaris,et al.  Scalable training with approximate incremental laplacian eigenmaps and PCA , 2013, ACM Multimedia.