论文信息 - Keyword search in text cube: Finding top-k relevant cells

Keyword search in text cube: Finding top-k relevant cells

We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cell document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.

[1] Yehoshua Sagiv,et al. Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[2] Berthold Reinwald,et al. Towards keyword-driven analytical processing , 2007, SIGMOD '07.

[3] Gerhard Weikum,et al. Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[4] S. Sudarshan,et al. Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[5] Shan Wang,et al. Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6] Andrew McCallum,et al. Group and Topic Discovery from Relations and Their Attributes , 2005, NIPS.

[7] Xuemin Lin,et al. SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8] Ronald L. Rivest,et al. Introduction to Algorithms, Second Edition , 2001 .

[9] Luis Gravano,et al. Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[10] Christopher Joseph Pal,et al. Generalized component analysis for text with heterogeneous attributes , 2007, KDD '07.

[11] Jeffrey Xu Yu,et al. Keyword Search in Relational Databases: A Survey , 2010, IEEE Data Eng. Bull..

[12] Moni Naor,et al. Optimal aggregation algorithms for middleware , 2001, PODS.

[13] Yehoshua Sagiv,et al. Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[14] Yehoshua Sagiv,et al. New algorithms for computing Steiner trees for a fixed number of terminals , 2006 .

[15] Jiawei Han,et al. Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases , 2009, SDM.

[16] Bo Zhao,et al. Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[17] Gerhard Weikum. DB&IR: both sides now , 2007, SIGMOD '07.

[18] Gabriele Reich,et al. Beyond Steiner's Problem: A VLSI Oriented Generalization , 1989, WG.

[19] S. Sudarshan,et al. Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[20] Vagelis Hristidis,et al. DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[21] Jian Pei,et al. Answering aggregate keyword queries on relational databases using minimal group-bys , 2009, EDBT '09.

[22] Sihem Amer-Yahia,et al. Report on the DB/IR panel at SIGMOD 2005 , 2005, SGMD.

[23] Yehoshua Sagiv,et al. Efficient Engines for Keyword Proximity Search , 2005, WebDB.

[24] Yehoshua Sagiv,et al. Efficiently Enumerating Results of Keyword Search , 2005, DBPL.

[25] Stephen E. Robertson,et al. Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[26] Yufei Tao,et al. Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27] Philip S. Yu,et al. BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[28] Bo Zhao,et al. TopCells: Keyword-based search of top-k aggregated documents in text cube , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[29] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .

[30] Clement T. Yu,et al. Effective keyword search in relational databases , 2006, SIGMOD Conference.

[31] Surajit Chaudhuri,et al. DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[32] C. A. R. Hoare,et al. Algorithm 65: find , 1961, Commun. ACM.

[33] Karen Sparck Jones,et al. Okapi at TREC{7: automatic ad hoc, ltering, VLC and interactive track , 1999 .

[34] Beng Chin Ooi,et al. EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.