Language models for keyword search over data graphs

In keyword search over data graphs, an answer is a non-redundant subtree that includes the given keywords. This paper focuses on improving the effectiveness of that type of search. A novel approach that combines language models with structural relevance is described. The proposed approach consists of three steps. First, language models are used to assign dynamic, query-dependent weights to the graph. Those weights complement static weights that are pre-assigned to the graph. Second, an existing algorithm returns candidate answers based on their weights. Third, the candidate answers are re-ranked by creating a language model for each one. The effectiveness of the proposed approach is verified on a benchmark of three datasets: IMDB, Wikipedia and Mondial. The proposed approach outperforms all existing systems on the three datasets, which is a testament to its robustness. It is also shown that the effectiveness can be further improved by augmenting keyword queries with very basic knowledge about the structure.

[1]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Yehoshua Sagiv,et al.  Exploratory keyword search on data graphs , 2010, SIGMOD Conference.

[3]  Gerhard Weikum,et al.  IQ: The Case for Iterative Querying for Knowledge , 2011, CIDR.

[4]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[6]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[7]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[8]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[9]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[10]  Eser Kandogan,et al.  Avatar semantic search: a database approach to information retrieval , 2006, SIGMOD Conference.

[11]  Gerhard Weikum,et al.  Language-model-based ranking for queries on RDF-graphs , 2009, CIKM.

[12]  Wolfgang Nejdl,et al.  IQP: Incremental query construction, a probabilistic approach , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[13]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Alfred C. Weaver,et al.  A framework for evaluating database keyword search strategies , 2010, CIKM.

[15]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[16]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[17]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  Ihab F. Ilyas,et al.  Expressive and flexible access to web-extracted data: a keyword-based structured query language , 2010, SIGMOD Conference.

[19]  Alfred C. Weaver,et al.  Structured data retrieval using cover density ranking , 2010, KEYS '10.