Semantic Interpretation of User Query for Question Answering on Interlinked Data

The Web of Data contains a wealth of knowledge belonging to a large number of domains. Retrieving data from such precious interlinked knowledge bases is an issue. By taking the structure of data into account, it is expected that upcoming generation of search engines is approaching to question answering systems, which directly answer user questions. But developing a question answering over these interlinked data sources is still challenging because of two inherent characteristics: First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between these datasets on both the schema and instance levels. In this respect, several challenges such as resource disambiguation, vocabulary mismatch, inference, link traversal are raised. In this dissertation, we address these challenges in order to build a question answering system for Linked Data. We present our question answering system Sina, which transforms user-supplied queries (i.e. either natural language queries or keyword queries) into conjunctive SPARQL queries over a set of interlinked data sources. The contributions of this work are as follows: 1. A novel approach for determining the most suitable resources for a user-supplied query from different datasets (disambiguation approach). We employed a Hidden Markov Model, whose parameters were bootstrapped with different distribution functions. 2. A novel method for constructing federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph, which ultimately renders a corresponding SPARQL query. 3. Regarding the problem of vocabulary mismatch, our contribution is divided into two parts, First, we introduce a number of new query expansion features based on semantic and linguistic inferencing over Linked Data. We evaluate the effectiveness of each feature individually as well as their combinations, employing Support Vector Machines and Decision Trees. Second, we propose a novel method for automatic query expansion, which employs a Hidden Markov Model to obtain the optimal tuples of derived words. 4. We provide two benchmarks for two different tasks to the community of question answering systems. The first one is used for the task of question answering on interlinked datasets (i.e. federated queries over Linked Data). The second one is used for the vocabulary mismatch task. We evaluate the accuracy of our approach using measures like mean reciprocal rank, precision, recall, and F-measure on three interlinked life-science datasets as well as DBpedia. The results of our accuracy evaluation demonstrate the effectiveness of our approach. Moreover, we study the runtime of our approach in its sequential as well as parallel implementations and draw conclusions on the scalability of our approach on Linked Data.

[1]  Peter Haase,et al.  Usability of Keyword-Driven Schema-Agnostic Search , 2010, ESWC.

[2]  Tim Furche,et al.  EAGER: Extending Automatically Gazetteers for Entity Recognition , 2012, PWNLP@ACL.

[3]  Sören Auer,et al.  SINA: Semantic interpretation of user queries for question answering on interlinked data , 2015, J. Web Semant..

[4]  Enrico Motta,et al.  AquaLog: An ontology-driven question answering system for organizational semantic intranets , 2007, J. Web Semant..

[5]  Martin Gerlach,et al.  Linguistic Modeling of Linked Open Data for Question Answering , 2012, ILD@ESWC.

[6]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[7]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[8]  Axel-Cyrille Ngonga Ngomo,et al.  Extracting Multilingual Natural-Language Patterns for RDF Predicates , 2012, EKAW.

[9]  Chu-Ren Huang,et al.  A Framework of Feature Selection Methods for Text Categorization , 2009, ACL.

[10]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[11]  Philipp Cimiano,et al.  Pythia: Compositional Meaning Construction for Ontology-Based Question Answering on the Semantic Web , 2011, NLDB.

[12]  Stavros Christodoulakis,et al.  The OntoNL Framework for Natural Language Interface Generation and a Domain-Specific Application , 2007, DELOS.

[13]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[14]  Haofen Wang,et al.  Semplore: A scalable IR approach to search the Web of Data , 2009, J. Web Semant..

[15]  Sebastian Hellmann,et al.  Generating SPARQL queries using templates , 2013, Web Intell. Agent Syst..

[16]  Fuchun Peng,et al.  Unsupervised query segmentation using generative language models and wikipedia , 2008, WWW.

[17]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  Stefan Decker,et al.  Sig.ma: Live views on the Web of Data , 2010, J. Web Semant..

[19]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[20]  Isabelle Augenstein,et al.  Mapping Keywords to Linked Data Resources for Automatic Query Expansion , 2013, KNOW@LOD.

[21]  Ophir Frieder,et al.  On understanding and classifying web queries , 2006 .

[22]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[23]  Bamshad Mobasher,et al.  Personalized recommendation in social tagging systems using hierarchical clustering , 2008, RecSys '08.

[24]  George C. Runger,et al.  Bias of Importance Measures for Multi-valued Attributes and Solutions , 2011, ICANN.

[25]  Sören Auer,et al.  Question answering on interlinked data , 2013, WWW.

[26]  Xiaohui Yu,et al.  Query segmentation using conditional random fields , 2009, KEYS '09.

[27]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[28]  Lin Guo XRANK : Ranked Keyword Search over XML Documents , 2003 .

[29]  Hong Yu,et al.  AskHERMES: An online question answering system for complex clinical questions , 2011, J. Biomed. Informatics.

[30]  Yannis Papakonstantinou,et al.  Supporting top-K keyword search in XML databases , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[31]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[32]  Thanh Tran,et al.  Heterogeneous web data search using relevance-based on the fly data integration , 2012, WWW.

[33]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[34]  Karin K. Breitman,et al.  Towards an Efficient RDF Dataset Slicing , 2013, Int. J. Semantic Comput..

[35]  Saeedeh Shekarpour DC Proposal: Automatically Transforming Keyword Queries to SPARQL on Large-Scale Knowledge Bases , 2011, International Semantic Web Conference.

[36]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[37]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[38]  Abraham Bernstein,et al.  How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users? , 2007, ISWC/ASWC.

[39]  Lora Aroyo,et al.  Semantic annotation and search of cultural-heritage collections: The MultimediaN E-Culture demonstrator , 2008, J. Web Semant..

[40]  Hai Jin,et al.  Practical and effective IR-style keyword search over semantic web , 2009, Inf. Process. Manag..

[41]  Isabelle Augenstein,et al.  Mining Equivalent Relations from Linked Data , 2013, ACL.

[42]  Robert E. Tarjan,et al.  Finding Minimum Spanning Trees , 1976, SIAM J. Comput..

[43]  Daniel Gayo-Avello,et al.  On the Fly Query Entity Decomposition Using Snippets , 2010, ArXiv.

[44]  Marko Grobelnik,et al.  Feature selection using linear classifier weights: interaction with classification models , 2004, SIGIR '04.

[45]  Steve Lawrence,et al.  Context in Web Search , 2000, IEEE Data Eng. Bull..

[46]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[47]  Charles Oppenheim,et al.  Access to information on the World Wide Web for blind and visually impaired people , 1999 .

[48]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[49]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[50]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[51]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[52]  Paul Buitelaar,et al.  A System Description of Natural Language Query over DBpedia , 2012, ILD@ESWC.

[53]  Sören Auer,et al.  Keyword-Driven Resource Disambiguation over RDF Knowledge Bases , 2012, JIST.

[54]  Kevyn Collins-Thompson,et al.  Reducing the risk of query expansion via robust constrained optimization , 2009, CIKM.

[55]  Christoph Meinel,et al.  Student's Perception of a Semantic Search Engine , 2005, CELDA.

[56]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[57]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[58]  Ravi Kumar,et al.  Searching with context , 2006, WWW '06.

[59]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[60]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[61]  Enrico Motta,et al.  Evaluating question answering over linked data , 2013, J. Web Semant..

[62]  Amit P. Sheth,et al.  Semantic Association Identification and Knowledge Discovery for National Security Applications , 2005, J. Database Manag..

[63]  Xiaotao Huang,et al.  A Relation-Based Search Engine in Semantic Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[64]  Philipp Frischmuth,et al.  An architecture of a distributed semantic social network , 2014, Semantic Web.

[65]  Michael Lesk,et al.  Word-word associations in document retrieval systems , 1969 .

[66]  Sören Auer,et al.  Large-Scale RDF Dataset Slicing , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[67]  Sören Auer,et al.  Query Segmentation and Resource Disambiguation Leveraging Background Knowledge , 2012, WoLE@ISWC.

[68]  Sheue-Ling Hwang,et al.  Specialized Design of Web Search Engine for the Blind People , 2007, HCI.

[69]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[70]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[71]  Jürgen Umbrich,et al.  Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine , 2011, J. Web Semant..

[72]  K. Pu,et al.  Keyword query cleaning , 2008, Proc. VLDB Endow..

[73]  Enrico Motta,et al.  Toward a New Generation of Semantic Web Applications , 2008, IEEE Intelligent Systems.

[74]  Wolfgang Nejdl,et al.  From keywords to semantic queries - Incremental query construction on the semantic web , 2009, J. Web Semant..

[75]  Jens Lehmann,et al.  Keyword Query Expansion on Linked Data Using Linguistic and Semantic Features , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[76]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[77]  Deniz Yuret,et al.  Word Sense Disambiguation for Information Retrieval , 1999, AAAI/IAAI.

[78]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[79]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[80]  Lei Zhang,et al.  CliniQA : Highly Reliable Clinical Question Answering System , 2012, MIE.

[81]  Yuzhong Qu,et al.  Searching Linked Objects with Falcons: Approach, Implementation and Evaluation , 2009, Int. J. Semantic Web Inf. Syst..

[82]  Jens Lehmann,et al.  Introduction to Linked Data and Its Lifecycle on the Web , 2013, Reasoning Web.

[83]  Fabien L. Gandon,et al.  QAKiS @ QALD-2 , 2012, ILD@ESWC.

[84]  Sebastian Hellmann,et al.  Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.