Cascade Category-Aware Visual Search

Incorporating image classification into image retrieval system brings many attractive advantages. For instance, the search space can be narrowed down by rejecting images in irrelevant categories of the query. The retrieved images can be more consistent in semantics by indexing and returning images in the relevant categories together. However, due to their different goals on recognition accuracy and retrieval scalability, it is hard to efficiently incorporate most image classification works into large-scale image search. To study this problem, we propose cascade category-aware visual search, which utilizes weak category clue to achieve better retrieval accuracy, efficiency, and memory consumption. To capture the category and visual clues of an image, we first learn category-visual words, which are discriminative and repeatable local features labeled with categories. By identifying category-visual words in database images, we are able to discard noisy local features and extract image visual and category clues, which are hence recorded in a hierarchical index structure. Our retrieval system narrows down the search space by: 1) filtering the noisy local features in query; 2) rejecting irrelevant categories in database; and 3) preforming discriminative visual search in relevant categories. The proposed algorithm is tested on object search, landmark search, and large-scale similar image search on the large-scale LSVRC10 data set. Although the category clue introduced is weak, our algorithm still shows substantial advantages in retrieval accuracy, efficiency, and memory consumption than the state-of-the-art.

[1]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[3]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[4]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Shiliang Zhang,et al.  Edge-SIFT: Discriminative Binary Descriptor for Scalable Partial-Duplicate Mobile Search , 2013, IEEE Transactions on Image Processing.

[6]  Cordelia Schmid,et al.  Combining attributes and Fisher vectors for efficient image retrieval , 2011, CVPR 2011.

[7]  JegouHerve,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010 .

[8]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[10]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Fei-Fei Li,et al.  Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[12]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[13]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.

[15]  Rong Jin,et al.  Online visual vocabulary pruning using pairwise constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Bingbing Ni,et al.  Building descriptive and discriminative visual codebook for large-scale image applications , 2010, Multimedia Tools and Applications.

[17]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Dong Liu,et al.  Robust late fusion with rank minimization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[20]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[21]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[23]  Ming Yang,et al.  Query Specific Fusion for Image Retrieval , 2012, ECCV.

[24]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[25]  Shiliang Zhang,et al.  Embedding Multi-Order Spatial Clues for Scalable Visual Matching and Retrieval , 2014, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[26]  Shiliang Zhang,et al.  Building pair-wise visual word tree for efficent image re-ranking , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[30]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Shiliang Zhang,et al.  Semantic-Aware Co-indexing for Image Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[33]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[34]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[35]  Cordelia Schmid,et al.  Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.