Computing structural statistics by keywords in databases

Keyword search in RDBs has been extensively studied in recent years. The existing studies focused on finding all or top-k interconnected tuple-structures that contain keywords. In reality, the number of such interconnected tuple-structures for a keyword query can be large. It becomes very difficult for users to obtain any valuable information more than individual interconnected tuple-structures. Also, it becomes challenging to provide a similar mechanism like group-&-aggregate for those interconnected tuple-structures. In this paper, we study computing structural statistics keyword queries by extending the group-&-aggregate framework. We consider an RDB as a large directed graph where nodes represent tuples, and edges represent the links among tuples. Instead of using tuples as a member in a group to be grouped, we consider rooted subgraphs. Such a rooted subgraph represents an interconnected tuple-structure among tuples and some of the tuples contain keywords. The dimensions of the rooted subgraphs are determined by dimensional-keywords in a data driven fashion. Two rooted subgraphs are grouped into the same group if they are isomorphic based on the dimensions or in other words the dimensional-keywords. The scores of the rooted subgraphs are computed by a user-given score function if the rooted subgraphs contain some of general keywords. Here, the general keywords are used to compute scores rather than determining dimensions. The aggregates are computed using an SQL aggregate function for every group based on the scores computed. We give our motivation using a real dataset. We propose new approaches to compute structural statistics keyword queries, perform extensive performance studies using two large real datasets and a large synthetic dataset, and confirm the effectiveness and efficiency of our approach.

[1]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[2]  Yin Yang,et al.  Keyword search on relational data streams , 2007, SIGMOD '07.

[3]  Bernard Dousset,et al.  DocCube: Multi-dimensional visualisation and exploration of large document sets , 2003, J. Assoc. Inf. Sci. Technol..

[4]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[5]  Vagelis Hristidis,et al.  Authority-based keyword search in databases , 2008, TODS.

[6]  Koichi Takeda,et al.  A method for online analytical processing of text data , 2007, CIKM '07.

[7]  Jian Pei,et al.  Answering aggregate keyword queries on relational databases using minimal group-bys , 2009, EDBT '09.

[8]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[10]  Sandeep Tata,et al.  SQAK: doing more with keywords , 2008, SIGMOD Conference.

[11]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[12]  Krithi Ramamritham,et al.  Keyword Search over Dynamic Categorized Information , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[13]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[14]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[15]  Berthold Reinwald,et al.  Towards keyword-driven analytical processing , 2007, SIGMOD '07.

[16]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[17]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[18]  S. E. Dreyfus,et al.  The steiner problem in graphs , 1971, Networks.

[19]  Vikram Gadi,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2012 .

[20]  Jeffrey Xu Yu,et al.  Keyword search in databases: the power of RDBMS , 2009, SIGMOD Conference.

[21]  Jeffrey Xu Yu,et al.  Computing Structural Statistics by Keywords in Databases , 2012, IEEE Trans. Knowl. Data Eng..

[22]  Berthold Reinwald,et al.  Multidimensional content eXploration , 2008, Proc. VLDB Endow..

[23]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[24]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[26]  Yufei Tao,et al.  Finding frequent co-occurring terms in relational keyword search , 2009, EDBT '09.

[27]  Bo Zhao,et al.  Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[28]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[29]  Jiawei Han,et al.  Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases , 2009, SDM.

[30]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.