Aggregate keyword search on large relational databases

Keyword search has been recently extended to relational databases to retrieve information from text-rich attributes. However, all the existing methods focus on finding individual tuples matching a set of query keywords from one table or the join of multiple tables. In this paper, we motivate a novel problem of aggregate keyword search: finding minimal group-bys covering a set of query keywords well, which is useful in many applications. We develop two interesting approaches to tackle the problem. We further extend our methods to allow partial matches and matches using a keyword ontology. An extensive empirical evaluation using both real data sets and synthetic data sets is reported to verify the effectiveness of aggregate keyword search and the efficiency of our methods.

[1]  Edward A. Fox,et al.  Inverted Files , 1992, Information Retrieval: Data Structures & Algorithms.

[2]  Berthold Reinwald,et al.  Towards keyword-driven analytical processing , 2007, SIGMOD '07.

[3]  Jeffrey Xu Yu,et al.  Keyword Search in Relational Databases: A Survey , 2010, IEEE Data Eng. Bull..

[4]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[5]  Mohand Boughanem,et al.  Towards a graph-based user profile modeling for a session-based personalized search , 2009, Knowledge and Information Systems.

[6]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[7]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[8]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[9]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[10]  Jeffrey Xu Yu,et al.  Keyword search in databases: the power of RDBMS , 2009, SIGMOD Conference.

[11]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[12]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[13]  Surajit Chaudhuri,et al.  DBXplorer: enabling keyword search over relational databases , 2002, SIGMOD '02.

[14]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Christos Faloutsos,et al.  Random walk with restart: fast solutions and applications , 2008, Knowledge and Information Systems.

[16]  Raymond T. Ng,et al.  Iceberg-cube computation with PC clusters , 2001, SIGMOD '01.

[17]  Anthony K. H. Tung,et al.  A graph method for keyword-based selection of the top-K databases , 2008, SIGMOD Conference.

[18]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[19]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[20]  Anthony K. H. Tung,et al.  Effective keyword-based selection of relational databases , 2007, SIGMOD '07.

[21]  Divyakant Agrawal,et al.  Range cube: efficient cube computation by exploiting data correlation , 2004, Proceedings. 20th International Conference on Data Engineering.

[22]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[23]  S. E. Dreyfus,et al.  The steiner problem in graphs , 1971, Networks.

[24]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[25]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[26]  Xuemin Lin,et al.  Keyword search on structured and semi-structured data , 2009, SIGMOD Conference.

[27]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[28]  Jian Pei,et al.  Answering aggregate keyword queries on relational databases using minimal group-bys , 2009, EDBT '09.

[29]  Sang-goo Lee,et al.  Keyword search in relational databases , 2010, Knowledge and Information Systems.

[30]  Sihem Amer-Yahia,et al.  Report on the DB/IR panel at SIGMOD 2005 , 2005, SGMD.

[31]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[32]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[33]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[34]  Surajit Chaudhuri,et al.  Keyword querying and Ranking in Databases , 2009, Proc. VLDB Endow..

[35]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[36]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[37]  Ramez Elmasri,et al.  BusSEngine: a business search engine , 2009, Knowledge and Information Systems.

[38]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[39]  Gerhard Weikum DB&IR: both sides now , 2007, SIGMOD '07.

[40]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[41]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[42]  FaloutsosChristos,et al.  Random walk with restart: fast solutions and applications , 2008 .

[43]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[44]  Qian Liu,et al.  Improving keyword based web image search with visual feature distribution and term expansion , 2009, Knowledge and Information Systems.