Querying Communities in Relational Databases

Keyword search on relational databases provides users with insights that they can not easily observe using the traditional RDBMS techniques. Here, an l-keyword query is specified by a set of l keywords, {k1, k2, · · · , kl}. It finds how the tuples that contain the keywords are connected in a relational database via the possible foreign key references. Conceptually, it is to find some structural information in a database graph, where nodes are tuples and edges are foreign key references. The existing work studied how to find connected trees for an l-keyword query. However, a tree may only show partial information about how those tuples that contain the keywords are connected. In this paper, we focus on finding communities for an l-keyword query. A community is an induced subgraph that contains all the l-keywords within a given distance. We propose new efficient algorithms to find all/top-k communities which consume small memory, for an l-keyword query. For top kl-keyword queries, our algorithm allows users to interactively enlarge k at run time. We conducted extensive performance studies using two large real datasets to confirm the efficiency of our algorithms.

[1]  Mihalis Yannakakis,et al.  On Generating All Maximal Independent Sets , 1988, Inf. Process. Lett..

[2]  Yehoshua Sagiv,et al.  Full disjunctions: polynomial-delay iterators in action , 2006, VLDB.

[3]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[5]  Yanchun Zhang,et al.  Web communities - analysis and construction , 2005 .

[6]  Yehoshua Sagiv,et al.  Efficiently Enumerating Results of Keyword Search , 2005, DBPL.

[7]  Yin Yang,et al.  Keyword search on relational data streams , 2007, SIGMOD '07.

[8]  E. Lawler A PROCEDURE FOR COMPUTING THE K BEST SOLUTIONS TO DISCRETE OPTIMIZATION PROBLEMS AND ITS APPLICATION TO THE SHORTEST PATH PROBLEM , 1972 .

[9]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[10]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[11]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[12]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[13]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[14]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[15]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[17]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.