Semantic image retrieval for complex queries using a knowledge parser

In order to improve the retrieval accuracy of image retrieval systems, research focus has been shifted from designing sophisticated low-level feature extraction algorithms to combining image retrieval processing with rich semantics and knowledge-based methods. In this paper, we aim at improving text-based image retrieval for complex natural language queries by using a semantic parser (Knowledge Parser or K-Parser). From text written in natural language, the K-parser extracts a graphical semantic representation of the objects involved, their properties as well as their relations. We analyze both the image textual captions and the natural language queries with the K-parser. As a technical solution, we leverage RDF in two ways: first, we store the parsed image captions as RDF triples; second, we translate image queries into SPARQL queries. When applied to the Flickr8k dataset with a set of 16 custom queries, we notice that the K-parser exhibits some biases that negatively affect the accuracy of the queries. We propose two techniques to address the weaknesses: (1) we introduce a set of rules to transform the output of K-parser and fix some basic, recurrent parsing mistakes that occur on the captions of Flickr8k; (2) we leverage two popular commonsense knowledge databases, ConceptNet and WordNet, to raise the accuracy of queries on broad concepts. Using those two techniques, we can fix most of the initial retrieval errors, and accurately execute our set of 16 queries on the Flickr8k dataset.

[1]  Geun-Duk Park,et al.  Linked tag: image annotation using semantic relationships between image tags , 2014, Multimedia Tools and Applications.

[2]  Ansgar Scherp Semantic technologies for multimedia content: foundations and applications , 2013, MM '13.

[3]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .

[4]  Chitta Baral,et al.  Towards Addressing the Winograd Schema Challenge - Building and Using a Semantic Parser and a Knowledge Hunting Module , 2015, IJCAI.

[5]  Huimin Lu,et al.  Non-Linear Matrix Completion for Social Image Tagging , 2017, IEEE Access.

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Huimin Lu,et al.  Learning unified binary codes for cross-modal retrieval via latent semantic hashing , 2016, Neurocomputing.

[8]  Michael Grobe,et al.  RDF, Jena, SparQL and the 'Semantic Web' , 2009, SIGUCCS '09.

[9]  Hsin-Hsi Chen,et al.  Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison , 2006, AIRS.

[10]  Huimin Lu,et al.  Single image dehazing through improved atmospheric light estimation , 2015, Multimedia Tools and Applications.

[11]  Li Fei-Fei,et al.  Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.

[12]  Michael S. Bernstein,et al.  Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[14]  Huimin Lu,et al.  Underwater image dehazing using joint trilateral filter , 2014, Comput. Electr. Eng..

[15]  Huimin Lu,et al.  Underwater image de-scattering and classification by deep neural network , 2016, Comput. Electr. Eng..

[16]  Peter Young,et al.  Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..

[17]  Chitta Baral,et al.  DeepIU : An Architecture for Image Understanding , 2016 .

[18]  Catherine Havasi,et al.  Representing General Relational Knowledge in ConceptNet 5 , 2012, LREC.

[19]  Peter Clark,et al.  KM – The Knowledge Machine 2.0: Users Manual , 2003 .

[20]  Akira Fukuda,et al.  An intelligent annotation-based image retrieval system based on RDF descriptions , 2017, Comput. Electr. Eng..

[21]  Chitta Baral,et al.  From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge , 2015, ArXiv.

[22]  Sharmi Sankar,et al.  A Schematic Analysis on Selective-RDF Database Stores , 2014 .

[23]  Yiannis Kompatsiaris,et al.  A Survey of Semantic Image and Video Annotation Tools , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[24]  Yang Yang,et al.  Matrix Tri-Factorization with Manifold Regularizations for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[26]  N. Magesh,et al.  Semantic Image Retrieval Based on Ontology and SPARQL Query , 2011 .