Facilitating Image Search With a Scalable and Compact Semantic Mapping

This paper introduces a novel approach to facilitating image search based on a compact semantic embedding. A novel method is developed to explicitly map concepts and image contents into a unified latent semantic space for the representation of semantic concept prototypes. Then, a linear embedding matrix is learned that maps images into the semantic space, such that each image is closer to its relevant concept prototype than other prototypes. In our approach, the semantic concepts equated with query keywords and the images mapped into the vicinity of the prototype are retrieved by our scheme. In addition, a computationally efficient method is introduced to incorporate new semantic concept prototypes into the semantic space by updating the embedding matrix. This novelty improves the scalability of the method and allows it to be applied to dynamic image repositories. Therefore, the proposed approach not only narrows semantic gap but also supports an efficient image search process. We have carried out extensive experiments on various cross-modality image search tasks over three widely-used benchmark image datasets. Results demonstrate the superior effectiveness, efficiency, and scalability of our proposed approach.

[1]  Yao Zhao,et al.  Joint Optimization Toward Effective and Efficient Image Search , 2013, IEEE Transactions on Cybernetics.

[2]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[3]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[4]  Chokri Ben Amar,et al.  Enhanced context-based query-to-concept mapping in social image retrieval , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[5]  Kilian Q. Weinberger,et al.  Large margin taxonomy embedding with an application to document categorization , 2008, NIPS 2008.

[6]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[7]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[8]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[9]  Ling Shao,et al.  Learning Discriminative Key Poses for Action Recognition , 2013, IEEE Transactions on Cybernetics.

[10]  Fadi Dornaika,et al.  Exponential Local Discriminant Embedding and Its Application to Face Recognition , 2013, IEEE Transactions on Cybernetics.

[11]  Qingshan Liu,et al.  Image retrieval via probabilistic hypergraph ranking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Shih-Fu Chang,et al.  Brain state decoding for rapid image retrieval , 2009, ACM Multimedia.

[13]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Yue Gao,et al.  Beyond Text QA: Multimedia Answer Generation by Harvesting Web Information , 2013, IEEE Transactions on Multimedia.

[15]  Shih-Fu Chang,et al.  Query-Adaptive Fusion for Multimodal Search , 2008, Proceedings of the IEEE.

[16]  Zi Huang,et al.  Robust Hashing With Local Models for Approximate Similarity Search , 2014, IEEE Transactions on Cybernetics.

[17]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[18]  Xuelong Li,et al.  Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014, IEEE Transactions on Cybernetics.

[19]  Xuelong Li,et al.  Spectral Embedded Hashing for Scalable Image Retrieval , 2014, IEEE Transactions on Cybernetics.

[20]  Meng Wang,et al.  Harvesting visual concepts for image search with complex queries , 2012, ACM Multimedia.

[21]  Xuelong Li,et al.  Rank Preserving Sparse Learning for Kinect Based Scene Classification , 2013, IEEE Transactions on Cybernetics.

[22]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[23]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[24]  Shih-Fu Chang,et al.  Video search reranking via information bottleneck principle , 2006, MM '06.

[25]  Xuelong Li,et al.  Saliency Detection by Multiple-Instance Learning , 2013, IEEE Transactions on Cybernetics.

[26]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[27]  Nuno Vasconcelos,et al.  Minimum probability of error image retrieval , 2012, IEEE Transactions on Signal Processing.

[28]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[29]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[30]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[31]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[32]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[33]  Jun Yu,et al.  On Combining Multiple Features for Cartoon Character Retrieval and Clip Synthesis , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.

[35]  Fabio A. González,et al.  Combining visual features and text data for medical image retrieval using latent semantic kernels , 2010, MIR '10.

[36]  Dong Liu,et al.  Image Retagging Using Collaborative Tag Propagation , 2011, IEEE Transactions on Multimedia.

[37]  Rohini K. Srihari,et al.  Automatic Indexing and Content-Based Retrieval of Captioned Images , 1995, Computer.

[38]  Marcel Worring,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Harvesting Social Images for Bi-Concept Search , 2022 .

[39]  Meng Wang,et al.  Spectral Hashing With Semantically Consistent Graph for Image Indexing , 2013, IEEE Transactions on Multimedia.

[40]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[41]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[42]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[43]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[45]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[46]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[47]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[48]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[49]  Kaizhu Huang,et al.  m-SNE: Multiview Stochastic Neighbor Embedding , 2011, IEEE Trans. Syst. Man Cybern. Part B.

[50]  C. L. Philip Chen,et al.  Hierarchical Feature Extraction With Local Neural Response for Image Recognition , 2013, IEEE Transactions on Cybernetics.

[51]  Meng Wang,et al.  Multimodal Graph-Based Reranking for Web Image Search , 2012, IEEE Transactions on Image Processing.

[52]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[53]  Dong Wang,et al.  The importance of query-concept-mapping for automatic video retrieval , 2007, ACM Multimedia.