Discriminative codebook learning for Web image search

Given the explosive growth of the Web images, image search plays an increasingly important role in our daily lives. The visual representation of image is the fundamental factor to the quality of content-based image search. Recently, bag-of-visual word model has been widely used for image representation and has demonstrated promising performance in many applications. In the bag-of-visual-word model, the codebook/visual vocabulary plays a crucial role. The conventional codebook, generated via unsupervised clustering approaches, does not embed the labeling information of images and therefore has less discriminative ability. Although some research has been conducted to construct codebooks with the labeling information considered, very few attempts have been made to exploit manifold geometry of the local feature space to improve codebook discriminative ability. In this paper, we propose a novel discriminative codebook learning method by introducing the subspace learning in codebook construction and leveraging its power to find a contextual local descriptor subspace to capture the discriminative information. The discriminative codebook construction and contextual subspace learning are formulated as an optimization problem and can be learned simultaneously. The effectiveness of the proposed method is evaluated through visual reranking experiments conducted on two real Web image search datasets.

[1]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[2]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Changhu Wang,et al.  Probabilistic models for supervised dictionary learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[7]  Nenghai Yu,et al.  Semantics-preserving bag-of-words models for efficient image annotation , 2009, LS-MMRM '09.

[8]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Meng Wang,et al.  Video semantic analysis based on structure-sensitive anisotropic manifold ranking , 2009, Signal Process..

[10]  Yi Yang,et al.  Web and Personal Image Annotation by Mining Label Correlation With Relaxed Visual Graph Embedding , 2012, IEEE Transactions on Image Processing.

[11]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Lei Wang Toward A Discriminative Codebook: Codeword Selection across Multi-resolution , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[16]  Song-Chun Zhu,et al.  Learning mixed templates for object recognition , 2009, CVPR.

[17]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Wen Gao,et al.  Adaptive relevance feedback based on Bayesian inference for image retrieval , 2005, Signal Process..

[19]  Yang Yang,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[20]  Xian-Sheng Hua,et al.  MSRA-MM: Bridging Research and Industrial Societies for Multimedia Information Retrieval , 2009 .

[21]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Rong Yan,et al.  Multimedia Search with Pseudo-relevance Feedback , 2003, CIVR.

[24]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[25]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[26]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[27]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, CVPR Workshops.

[30]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[31]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[32]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[34]  Svetlana Lazebnik,et al.  Supervised Learning of Quantizer Codebooks by Information Loss Minimization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[36]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.