RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem

For non-expert users, a textual query is the most popular and simple means for communicating with a retrieval or question answering system. However, there is a risk of receiving queries which do not match with the background knowledge. Query expansion and query rewriting are solutions for this problem but they are in danger of potentially yielding a large number of irrelevant words, which in turn negatively influences runtime as well as accuracy. In this paper, we propose a new method for automatic rewriting input queries on graph-structured RDF knowledge bases. We employ a Hidden Markov Model to determine the most suitable derived words from linguistic resources. We introduce the concept of triplebased co-occurrence for recognizing co-occurred words in RDF data. This model was bootstrapped with three statistical distributions. Our experimental study demonstrates the superiority of the proposed approach to the traditional n-gram model.

[1]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[2]  Efthimis N. Efthimiadis,et al.  A user-centred evaluation of ranking algorithms for interactive query expansion , 1993, SIGIR.

[3]  Takenobu Tokunaga,et al.  The Use of WordNet in Information Retrieval , 1998, WordNet@ACL/COLING.

[4]  Roberto Navigli,et al.  An analysis of ontology-based query expansion strategies , 2003 .

[5]  Tim Furche,et al.  EAGER: Extending Automatically Gazetteers for Entity Recognition , 2012, PWNLP@ACL.

[6]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[7]  Zhendong Niu,et al.  Concept Based Query Expansion , 2013, 2013 Ninth International Conference on Semantics, Knowledge and Grids.

[8]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[9]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[10]  Sebastian Hellmann,et al.  Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[11]  Ingmar Weber,et al.  Efficient interactive query expansion with complete search , 2007, CIKM '07.

[12]  Enrico Motta,et al.  Integration of micro-gravity and geodetic data to constrain shallow system mass changes at Krafla Volcano, N Iceland , 2006 .

[13]  Jun Guo,et al.  Improving Retrieval Performance by Global Analysis , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[14]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[15]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[16]  Jens Lehmann,et al.  Keyword Query Expansion on Linked Data Using Linguistic and Semantic Features , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[17]  Jan O. Pedersen Information Retrieval Based on Word Senses , 1995 .

[18]  Kotagiri Ramamohanarao,et al.  Query Expansion Using a Collection Dependent Probabilistic Latent Semantic Thesaurus , 2007, PAKDD.

[19]  Bodo Billerbeck,et al.  Efficient query expansion , 2005 .

[20]  Ian H. Witten,et al.  A knowledge-based search engine powered by wikipedia , 2007, CIKM '07.

[21]  Isabelle Augenstein,et al.  Mining Equivalent Relations from Linked Data , 2013, ACL.

[22]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[23]  Jianying Wang,et al.  A corpus analysis approach for automatic query expansion and its extension to multiple databases , 1999, TOIS.

[24]  Lauren B. Doyle,et al.  Semantic Road Maps for Literature Searchers , 1961, JACM.

[25]  Carolyn J. Crouch,et al.  Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[26]  Iadh Ounis,et al.  Query reformulation using automatically generated query concepts from a document space , 2006, Inf. Process. Manag..

[27]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[28]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[29]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[30]  Martin Gerlach,et al.  Linguistic Modeling of Linked Open Data for Question Answering , 2012, ILD@ESWC.