Question answering on interlinked data

The Data Web contains a wealth of knowledge on a large number of domains. Question answering over interlinked data sources is challenging due to two inherent characteristics. First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between the different datasets on both the schema and instance levels. We present a question answering system, which transforms user supplied queries (i.e. natural language sentences or keywords) into conjunctive SPARQL queries over a set of interlinked data sources. The contribution of this paper is two-fold: Firstly, we introduce a novel approach for determining the most suitable resources for a user-supplied query from different datasets (disambiguation). We employ a hidden Markov model, whose parameters were bootstrapped with different distribution functions. Secondly, we present a novel method for constructing a federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph which ultimately renders a corresponding SPARQL query. The results of our evaluation with three life-science datasets and 25 benchmark queries demonstrate the effectiveness of our approach.

[1]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[2]  Xiaohui Yu,et al.  Query segmentation using conditional random fields , 2009, KEYS '09.

[3]  Ravi Kumar,et al.  Searching with context , 2006, WWW '06.

[4]  Haofen Wang,et al.  Semplore: A scalable IR approach to search the Web of Data , 2009, J. Web Semant..

[5]  Stefan Decker,et al.  Sig.ma: Live views on the Web of Data , 2010, J. Web Semant..

[6]  Bamshad Mobasher,et al.  Personalized recommendation in social tagging systems using hierarchical clustering , 2008, RecSys '08.

[7]  Robert E. Tarjan,et al.  Finding Minimum Spanning Trees , 1976, SIAM J. Comput..

[8]  Daniel Gayo-Avello,et al.  On the Fly Query Entity Decomposition Using Snippets , 2010, ArXiv.

[9]  Steve Lawrence,et al.  Context in Web Search , 2000, IEEE Data Eng. Bull..

[10]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[11]  Eyal Oren,et al.  Sindice.com: Weaving the Open Linked Data , 2007, ISWC/ASWC.

[12]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[13]  Enrico Motta,et al.  Integration of micro-gravity and geodetic data to constrain shallow system mass changes at Krafla Volcano, N Iceland , 2006 .

[14]  Jürgen Umbrich,et al.  Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine , 2011, J. Web Semant..

[15]  Maria Teresa Pazienza,et al.  Semantic turkey: a browser-integrated environment for knowledge acquisition and management , 2012 .

[16]  Thanh Tran,et al.  Heterogeneous web data search using relevance-based on the fly data integration , 2012, WWW.

[17]  Peter Boros,et al.  Query Segmentation for Web Search , 2003, WWW.

[18]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[19]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  Deniz Yuret,et al.  Word Sense Disambiguation for Information Retrieval , 1999, AAAI/IAAI.

[21]  Yuzhong Qu,et al.  Searching Linked Objects with Falcons: Approach, Implementation and Evaluation , 2009, Int. J. Semantic Web Inf. Syst..

[22]  K. Pu,et al.  Keyword query cleaning , 2008, Proc. VLDB Endow..

[23]  Enrico Motta,et al.  Toward a New Generation of Semantic Web Applications , 2008, IEEE Intelligent Systems.

[24]  Yun Peng,et al.  Swoogle: A semantic web search and metadata engine , 2004, CIKM 2004.

[25]  Stefan Decker,et al.  Sig.ma: live views on the web of data , 2010, WWW '10.

[26]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[27]  Axel-Cyrille Ngonga Ngomo,et al.  Extracting Multilingual Natural-Language Patterns for RDF Predicates , 2012, EKAW.

[28]  Philipp Cimiano,et al.  Pythia: Compositional Meaning Construction for Ontology-Based Question Answering on the Semantic Web , 2011, NLDB.

[29]  Stavros Christodoulakis,et al.  The OntoNL Framework for Natural Language Interface Generation and a Domain-Specific Application , 2007, DELOS.

[30]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[31]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[32]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[33]  Fuchun Peng,et al.  Unsupervised query segmentation using generative language models and wikipedia , 2008, WWW.