Virtual Documents and Answer Priors in Keyword Search over Data Graphs

In keyword search over data graphs, an answer is a nonredundant subtree that contains the keywords of the query. Ranking of answers should take into account both their textual relevance and the significance of their semantic structure. A novel method for answers priors is developed and used in conjunction with query-dependent features. Since the space of all possible answers is huge, efficiency is also a major problem. A new algorithm that drastically cuts down the search space is presented. It generates candidate answers by first selecting top-n roots and top-n nodes for each query keyword. The selection is by means of a novel concept of virtual documents with weighted term frequencies. Markov random field models are used for ranking the virtual documents and then the generated answers. The proposed approach outperforms existing systems on a standard evaluation framework.

[1]  W. Bruce Croft,et al.  A Field Relevance Model for Structured Document Retrieval , 2012, ECIR.

[2]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[3]  Iadh Ounis,et al.  Combining fields for query expansion and adaptive query expansion , 2007, Inf. Process. Manag..

[4]  Djoerd Hiemstra,et al.  Statistical Language Models for Intelligent XML Retrieval , 2003, Intelligent Search on XML Data.

[5]  Gilad Mishne,et al.  Language Models for Searching in Web Corpora , 2004, TREC.

[6]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[8]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[9]  Alfred C. Weaver,et al.  Learning to rank results in relational keyword search , 2011, CIKM '11.

[10]  Alfred C. Weaver,et al.  A framework for evaluating database keyword search strategies , 2010, CIKM.

[11]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[12]  W. Bruce Croft,et al.  Quality-biased ranking of web documents , 2011, WSDM '11.

[13]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[14]  Alfred C. Weaver,et al.  Structured data retrieval using cover density ranking , 2010, KEYS '10.

[15]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[16]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[17]  Roi Blanco,et al.  Keyword search over RDF graphs , 2011, CIKM '11.

[18]  Yehoshua Sagiv,et al.  Language models for keyword search over data graphs , 2012, WSDM '12.

[19]  Roi Blanco,et al.  Effective and Efficient Entity Search in RDF Data , 2011, SEMWEB.

[20]  Thanh Tran,et al.  Ranking support for keyword search on structured data using relevance models , 2011, CIKM '11.

[21]  Jennifer Widom,et al.  Indexing relational database content offline for efficient keyword-based search , 2005, 9th International Database Engineering & Application Symposium (IDEAS'05).