Efficient Keyword-Based Search for Top-K Cells in Text Cube

Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked structures (e.g., a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for scoring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches: inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches.

[1]  Eugene J. Shekita,et al.  Beyond basic faceted search , 2008, WSDM '08.

[2]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[3]  David R. Karger,et al.  Magnet: supporting navigation in semistructured data environments , 2005, SIGMOD '05.

[4]  Amit Singhal,et al.  AT&T at TREC-7 , 1998, TREC.

[5]  Bo Zhao,et al.  Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[6]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Jun Rao,et al.  Dynamic faceted search for discovery-driven analysis , 2008, CIKM '08.

[9]  Vikram Gadi,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2012 .

[10]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[11]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[12]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Mukesh K. Mohania,et al.  Retrieval]: Query formulation, search process , 2022 .

[14]  Yi Zhang,et al.  Personalized interactive faceted search , 2008, WWW.

[15]  Jaime G. Carbonell,et al.  Retrieval and feedback models for blog feed search , 2008, SIGIR '08.

[16]  Berthold Reinwald,et al.  DBPubs: multidimensional exploration of database publications , 2008, Proc. VLDB Endow..

[17]  Jiawei Han,et al.  Answering top-k queries with multi-dimensional selections: the ranking cube approach , 2006, VLDB.

[18]  Sihem Amer-Yahia,et al.  Report on the DB/IR panel at SIGMOD 2005 , 2005, SGMD.

[19]  Yehoshua Sagiv,et al.  Efficient Engines for Keyword Proximity Search , 2005, WebDB.

[20]  Yehoshua Sagiv,et al.  Efficiently Enumerating Results of Keyword Search , 2005, DBPL.

[21]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[23]  Jian Pei,et al.  Answering aggregate keyword queries on relational databases using minimal group-bys , 2009, EDBT '09.

[24]  Mukesh K. Mohania,et al.  DynaCet: Building Dynamic Faceted Search Systems over Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[25]  Bo Zhao,et al.  TopCells: Keyword-based search of top-k aggregated documents in text cube , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[26]  Gerhard Weikum DB&IR: both sides now , 2007, SIGMOD '07.

[27]  Gautam Das,et al.  Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia , 2010, WWW '10.

[28]  Marti A. Hearst,et al.  Finding the flow in web site search , 2002, CACM.

[29]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[30]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[31]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[32]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[33]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[34]  Berthold Reinwald,et al.  Towards keyword-driven analytical processing , 2007, SIGMOD '07.

[35]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[36]  Jiawei Han,et al.  Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases , 2009, SDM.

[37]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[38]  Berthold Reinwald,et al.  Multidimensional content eXploration , 2008, Proc. VLDB Endow..

[39]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[40]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.