*Improving Natural Language Queries Search and Retrieval through Semantic Image Annotation Understanding

Retrieving images using detailed natural language queries remains a difficult challenge. Traditional annotation-based image retrieval systems using word matching techniques cannot efficiently support such query types. Significant improvements for this problem can be achieved with a semantic understanding for those query sentences and image annotations. This paper presents a two-stage semantic understanding approach for natural language query sentences. At the first stage, the Stanford parser and a designed rule-based relation extraction tool are used in triple extraction process to efficiently extract the objects attributes, instances and natural language annotations relationships involving these objects. The second stage integrates the extracted relations with external commonsense knowledge source in a mapping process to provide high-level semantic meanings to the extracted triples. Experiments are conducted for evaluating the benefit of the proposed semantic understanding against a testing set of natural language sentences from the Flickr8k dataset. The results show that the proposed approach succeeds to extract relational triples with average accuracy value of 97% for the different types of annotations relationships: attributes and instance relations, multiword dependence relations, and semantic relations.

[1]  Mark Steedman,et al.  Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning , 2012 .

[2]  Chenxi Liu,et al.  Scene Graph Parsing as Dependency Parsing , 2018, NAACL.

[3]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[4]  Tarek F. Gharib,et al.  Enhancing image retrieval for complex queries using external knowledge sources , 2020, Multimedia Tools and Applications.

[5]  Sanja Fidler,et al.  Visual Semantic Search: Retrieving Videos via Complex Textual Queries , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[7]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[8]  Jaime G. Carbonell,et al.  A Discriminative Graph-Based Parser for the Abstract Meaning Representation , 2014, ACL.

[9]  Anton van den Hengel,et al.  Graph-Structured Representations for Visual Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[11]  Armand Joulin,et al.  Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.

[12]  Chuan Wang,et al.  A Transition-based Algorithm for AMR Parsing , 2015, NAACL.

[13]  Akira Fukuda,et al.  Semantic image retrieval for complex queries using a knowledge parser , 2017, Multimedia Tools and Applications.

[14]  Michael S. Bernstein,et al.  Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[16]  Li Fei-Fei,et al.  Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.

[17]  Chitta Baral,et al.  DeepIU : An Architecture for Image Understanding , 2016 .

[18]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[19]  Alexander F. Gelbukh,et al.  Dependency-Based Semantic Parsing for Concept-Level Text Analysis , 2014, CICLing.

[20]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[21]  Basura Fernando,et al.  SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.

[22]  Lucy Vanderwende,et al.  Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Svetlana Lazebnik,et al.  Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Jorge A. Baier,et al.  How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval , 2017, IJCAI.

[25]  Ryutaro Ichise,et al.  An Automatic Knowledge Graph Creation Framework from Natural Language Text , 2018, IEICE Trans. Inf. Syst..

[26]  D. Mladení,et al.  TRIPLET EXTRACTION FROM SENTENCES , 2007 .

[27]  Ahmed E Amin A NOVEL IMAGE RETRIEVAL FRAMEWORK BASED ON KNOWLEDGE BASED SYSTEM , 2014 .

[28]  Peter Young,et al.  Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..

[29]  Christopher D. Manning,et al.  Robust Subgraph Generation Improves Abstract Meaning Representation Parsing , 2015, ACL.

[30]  Elsevier Sdol,et al.  Journal of Visual Communication and Image Representation , 2009 .

[31]  Martin Necaský,et al.  Data Extraction Using NLP Techniques and Its Transformation to Linked Data , 2014, MICAI.

[32]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[33]  Angel X. Chang,et al.  Learning Spatial Knowledge for Text to 3D Scene Generation , 2014, EMNLP.

[34]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.