Answering aggregate keyword queries on relational databases using minimal group-bys

Keyword search has been recently extended to relational databases to retrieve information from text-rich attributes. However, all the existing methods focus on finding individual tuples matching a set of query keywords from one table or the join of multiple tables. In this paper, we motivate a novel problem of aggregate keyword search: finding minimal group-bys covering a set of query keywords well, which is useful in many applications. We develop two interesting approaches to tackle the problem, and further extend our methods to allow partial matches. An extensive empirical evaluation using both real data sets and synthetic data sets is reported to verify the effectiveness of aggregate keyword search and the efficiency of our methods.

[1]  Raymond T. Ng,et al.  Iceberg-cube computation with PC clusters , 2001, SIGMOD '01.

[2]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[3]  Gerhard Weikum DB&IR: both sides now , 2007, SIGMOD '07.

[4]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[5]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[6]  Berthold Reinwald,et al.  Towards keyword-driven analytical processing , 2007, SIGMOD '07.

[7]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[8]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[9]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[11]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Jeffrey Xu Yu,et al.  Keyword Search in Relational Databases: A Survey , 2010, IEEE Data Eng. Bull..

[13]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[14]  Sihem Amer-Yahia,et al.  Report on the DB/IR panel at SIGMOD 2005 , 2005, SGMD.

[15]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[16]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[17]  Anthony K. H. Tung,et al.  A graph method for keyword-based selection of the top-K databases , 2008, SIGMOD Conference.

[18]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[19]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[20]  Divyakant Agrawal,et al.  Range cube: efficient cube computation by exploiting data correlation , 2004, Proceedings. 20th International Conference on Data Engineering.

[21]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[22]  Anthony K. H. Tung,et al.  Effective keyword-based selection of relational databases , 2007, SIGMOD '07.

[23]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[24]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[25]  RamakrishnanRaghu,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[26]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[27]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.