Efficient processing of keyword queries over graph databases for finding effective answers

Abstract In this paper, we study on effective and efficient processing of keyword-based queries over graph databases. To produce more relevant answers to a query than the previous approaches, we suggest a new answer tree structure which has no constraint on the number of keyword nodes chosen for each keyword in the query. For efficient search of answer trees on the large graph databases, we design an inverted list index to pre-compute and store connectivity and relevance information of nodes to keyword terms in the graph. We propose a query processing algorithm which aggregates from the pre-constructed inverted lists the best keyword nodes and root nodes to find top- k answer trees most relevant to the given query. We also enhance the method by extending the structure of the inverted list and adopting a relevance lookup table, which enables more accurate estimation of the relevance scores of candidate root nodes and efficient search of top- k answer trees. Performance evaluation by experiments with real graph datasets shows that the proposed method can find more effective top- k answers than the previous approaches and provides acceptable and scalable execution performance for various types of keyword queries on large graph databases.

[1]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[2]  Gerhard Weikum,et al.  Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[3]  Jeffrey F. Naughton,et al.  Toward scalable keyword search over relational data , 2010, Proc. VLDB Endow..

[4]  F. Hwang,et al.  The Steiner Tree Problem , 2012 .

[5]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[6]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[7]  Sonia Bergamaschi,et al.  Keyword search over relational databases: a metadata approach , 2011, SIGMOD '11.

[8]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[9]  Jeffrey Xu Yu,et al.  Keyword search in databases: the power of RDBMS , 2009, SIGMOD Conference.

[10]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[11]  Chang-Sup Park,et al.  Improving Keyword Match for Semantic Search , 2011, IEICE Trans. Inf. Syst..

[12]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[13]  Charles L. A. Clarke,et al.  Information Retrieval - Implementing and Evaluating Search Engines , 2010 .

[14]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[15]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[17]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[19]  Xuemin Lin,et al.  Keyword search on structured and semi-structured data , 2009, SIGMOD Conference.

[20]  Dana S. Richards,et al.  Steiner tree problems , 1992, Networks.

[21]  Aijun An,et al.  Keyword Search in Graphs: Finding r-cliques , 2011, Proc. VLDB Endow..

[22]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[23]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[24]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[25]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[26]  Wolf-Tilo Balke,et al.  Towards efficient multi-feature queries in heterogeneous environments , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[27]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[28]  Chang-Sup Park An Effective Keyword Search Method for Graph-Structured Data Using Extended Answer Structure , 2013, ICCSA.

[29]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  Jeffrey Xu Yu,et al.  Keyword Search in Relational Databases: A Survey , 2010, IEEE Data Eng. Bull..

[31]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.