Index structures and top-k join algorithms for native keyword search databases

For supporting keyword search on structured data, current solutions require large indexes to be built that redundantly store subgraphs called neighborhoods. Further, for exploring keyword search results, large graphs have to be loaded into memory. We propose a solution, which employs much more compact index structures for neighborhood lookups. Using these indexes, we reduce keyword search result exploration to the traditional database problem of top-k join processing, enabling results to be computed efficiently. In particular, this computation can be performed on data streams successively loaded from disk (i.e., does not require the entire input to be loaded at once into memory). For supporting this, we propose a top-k procedure based on the rank join operator, which not only computes the k-best results, but also selects query plans in a top-k fashion during the process. In experiments using large real-world datasets, our solution reduced storage requirements and also outperformed the state-of-the-art in terms of performance and scalability.

[1]  Jeffrey Xu Yu,et al.  Keyword Search in Relational Databases: A Survey , 2010, IEEE Data Eng. Bull..

[2]  Jeffrey Xu Yu,et al.  On-line exact shortest distance query processing , 2009, EDBT '09.

[3]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[4]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[5]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Jeffrey Xu Yu,et al.  Keyword search in databases: the power of RDBMS , 2009, SIGMOD Conference.

[7]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[8]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[9]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[10]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[11]  Vagelis Hristidis,et al.  ObjectRank: a system for authority-based search on databases , 2006, SIGMOD Conference.

[12]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[13]  Jeffrey Xu Yu,et al.  Ten thousand SQLs , 2010, Proc. VLDB Endow..

[14]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[15]  Walid G. Aref,et al.  Joining Ranked Inputs in Practice , 2002, VLDB.

[16]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[17]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.