Keyword Query Expansion on Linked Data Using Linguistic and Semantic Features

Effective search in structured information based on textual user input is of high importance in thousands of applications. Query expansion methods augment the original query of a user with alternative query elements with similar meaning to increase the chance of retrieving appropriate resources. In this work, we introduce a number of new query expansion features based on semantic and linguistic inferencing over Linked Open Data. We evaluate the effectiveness of each feature individually as well as their combinations employing several machine learning approaches. The evaluation is carried out on a training dataset extracted from the QALD question answering benchmark. Furthermore, we propose an optimized linear combination of linguistic and lightweight semantic features in order to predict the usefulness of each expansion candidate. Our experimental study shows a considerable improvement in precision and recall over baseline approaches.

[1]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[2]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[3]  Sören Auer,et al.  Question answering on interlinked data , 2013, WWW.

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Kevyn Collins-Thompson,et al.  Reducing the risk of query expansion via robust constrained optimization , 2009, CIKM.

[6]  Chu-Ren Huang,et al.  A Framework of Feature Selection Methods for Text Categorization , 2009, ACL.

[7]  D. Gerber,et al.  Bootstrapping the Linked Data Web , 2011 .

[8]  Isabelle Augenstein,et al.  Mining Equivalent Relations from Linked Data , 2013, ACL.

[9]  Paul Buitelaar,et al.  A System Description of Natural Language Query over DBpedia , 2012, ILD@ESWC.

[10]  Tim Furche,et al.  EAGER: Extending Automatically Gazetteers for Entity Recognition , 2012, PWNLP@ACL.

[11]  Maria Teresa Pazienza,et al.  Semantic turkey: a browser-integrated environment for knowledge acquisition and management , 2012 .

[12]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[13]  Isabelle Augenstein,et al.  Mapping Keywords to Linked Data Resources for Automatic Query Expansion , 2013, KNOW@LOD.

[14]  Fabien L. Gandon,et al.  QAKiS @ QALD-2 , 2012, ILD@ESWC.

[15]  George C. Runger,et al.  Bias of Importance Measures for Multi-valued Attributes and Solutions , 2011, ICANN.

[16]  Martin Gerlach,et al.  Linguistic Modeling of Linked Open Data for Question Answering , 2012, ILD@ESWC.

[17]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[18]  Marko Grobelnik,et al.  Feature selection using linear classifier weights: interaction with classification models , 2004, SIGIR '04.

[19]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[20]  Enrico Motta,et al.  Integration of micro-gravity and geodetic data to constrain shallow system mass changes at Krafla Volcano, N Iceland , 2006 .