Image Retrieval for Complex Queries Using Knowledge Embedding

With the increase in popularity of image-based applications, users are retrieving images using more sophisticated and complex queries. We present three types of complex queries, namely, long, ambiguous, and abstract. Each type of query has its own characteristics/complexities and thus leads to imprecise and incomplete image retrieval. Existing methods for image retrieval are unable to deal with the high complexity of such queries. Search engines need to integrate their image retrieval process with knowledge to obtain rich semantics for effective retrieval. We propose a framework, Image Retrieval using Knowledge Embedding (ImReKE), for embedding knowledge with images and queries, allowing retrieval approaches to understand the context of queries and images in a better way. ImReKE (IR_Approach, Knowledge_Base) takes two inputs, namely, an image retrieval approach and a knowledge base. It selects quality concepts (concepts that possess properties such as rarity, newness, etc.) from the knowledge base to provide rich semantic representations for queries and images to be leveraged by the image retrieval approach. For the first time, an effective knowledge base that exploits both the visual and textual information of concepts has been developed. Our extensive experiments demonstrate that the proposed framework improves image retrieval significantly for all types of complex queries. The improvement is remarkable in the case of abstract queries, which have not yet been dealt with explicitly in the existing literature. We also compare the quality of our knowledge base with the existing text-based knowledge bases, such as ConceptNet, ImageNet, and the like.

[1]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[2]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[3]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[4]  Edward H. Adelson,et al.  Crisp Boundary Detection Using Pointwise Mutual Information , 2014, ECCV.

[5]  Gerhard Weikum,et al.  WebChild: harvesting and organizing commonsense knowledge from the web , 2014, WSDM.

[6]  Jing Wang,et al.  Clickage: towards bridging semantic and intent gaps via mining click logs of search engines , 2013, ACM Multimedia.

[7]  Dan Guo,et al.  Complex-query web image search with concept-based relevance estimation , 2015, World Wide Web.

[8]  Meng Wang,et al.  Harvesting visual concepts for image search with complex queries , 2012, ACM Multimedia.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Qingming Huang,et al.  Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval , 2018, ACM Multimedia.

[11]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[12]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[13]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[14]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[15]  Wei Liu,et al.  Predicting Entry-Level Categories , 2015, International Journal of Computer Vision.

[16]  Marcel Worring,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Harvesting Social Images for Bi-Concept Search , 2022 .

[17]  Xiaogang Wang,et al.  IntentSearch: Capturing User Intention for One-Click Internet Image Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[19]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[20]  Lexing Xie,et al.  Choosing Basic-Level Concept Names Using Visual and Language Context , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[21]  Meng Wang,et al.  Image Re-Ranking Based on Topic Diversity , 2017, IEEE Transactions on Image Processing.

[22]  Djoerd Hiemstra,et al.  Building Detectors to Support Searches on Combined Semantic Concepts , 2007 .

[23]  Akira Fukuda,et al.  Semantic image retrieval for complex queries using a knowledge parser , 2017, Multimedia Tools and Applications.

[24]  Xuelong Li,et al.  Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[25]  Adrian Popescu,et al.  Multimodal Image Retrieval over a Large Database , 2009, CLEF.

[26]  Yi-Ping Phoebe Chen,et al.  Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries , 2018, ICMR.

[27]  Shuang Wang,et al.  INSTRE: A New Benchmark for Instance-Level Object Retrieval and Recognition , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[28]  Gerhard Weikum,et al.  Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags , 2016, AAAI.

[29]  Kilian Q. Weinberger,et al.  Fast Image Tagging , 2013, ICML.

[30]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[31]  Pietro Perona,et al.  Visipedia circa 2015 , 2016, Pattern Recognit. Lett..

[32]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[33]  Minglun Gong,et al.  Combining conceptual query expansion and visual search results exploration for web image retrieval , 2011, Journal of Ambient Intelligence and Humanized Computing.

[34]  Shaowei Liu,et al.  General Knowledge Embedded Image Representation Learning , 2018, IEEE Transactions on Multimedia.

[35]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[37]  Qi Tian,et al.  Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval , 2018, ACM Multimedia.

[38]  Chaoran Cui,et al.  Learning to rank images for complex queries in concept-based search , 2018, Neurocomputing.

[39]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[40]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[41]  Guillaume Pitel,et al.  Image clustering based on a shared nearest neighbors approach for tagged collections , 2008, CIVR '08.

[42]  James Ze Wang,et al.  PARAgrab: a comprehensive architecture for web image management and multimodal querying , 2006, VLDB.

[43]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[44]  Chokri Ben Amar,et al.  Adaptive diversification for tag-based social image retrieval , 2014, International Journal of Multimedia Information Retrieval.

[45]  Larry S. Davis,et al.  Multi-Modal Image Retrieval for Complex Queries using Small Codes , 2014, ICMR.

[46]  Gerhard Weikum,et al.  VISIR: Visual and Semantic Image Label Refinement , 2018, WSDM.

[47]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[48]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[49]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[50]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.