论文信息 - Answering top-k queries with multi-dimensional selections: the ranking cube approach - 字舞流文

Answering top-k queries with multi-dimensional selections: the ranking cube approach

Observed in many real applications, a top-k query often consists of two components to reflect a user's preference: a selection condition and a ranking function. A user may not only propose ad hoc ranking functions, but also use different interesting subsets of the data. In many cases, a user may want to have a thorough study of the data by initiating a multi-dimensional analysis of the top-k query results. Previous work on top-k query processing mainly focuses on optimizing data access according to the ranking function only. The problem of efficient answering top-k queries with multi-dimensional selections has not been well addressed yet.This paper proposes a new computational model, called ranking cube, for efficient answering top-k queries with multi-dimensional selections. We define a rank-aware measure for the cube, capturing our goal of responding to multi-dimensional ranking analysis. Based on the ranking cube, an efficient query algorithm is developed which progressively retrieves data blocks until the top-k results are found. The curse of dimensionality is a well-known challenge for the data cube and we cope with this difficulty by introducing a new technique of ranking fragments. Our experiments on Microsoft's SQL Server 2005 show that our proposed approaches have significant improvement over the previous methods.

Jiawei Han | Hong Cheng | Dong Xin | Xiaolei Li | Jiawei Han | Hong Cheng | Dong Xin | Xiaolei Li

[1] Patrick E. O'Neil,et al. Improved query performance with variant indexes , 1997, SIGMOD '97.

[2] Jeffrey Scott Vitter,et al. Efficient searching with linear constraints , 1998, J. Comput. Syst. Sci..

[3] Surajit Chaudhuri,et al. An overview of data warehousing and OLAP technology , 1997, SGMD.

[4] Jonathan Goldstein,et al. Processing queries by linear constraints , 1997, PODS '97.

[5] Jiawei Han,et al. High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.

[6] Gregory Piatetsky-Shapiro,et al. Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[7] Christian Böhm,et al. Optimal Multidimensional Query Processing Using Tree Striping , 2000, DaWaK.

[8] Seung-won Hwang,et al. Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[9] Moni Naor,et al. Optimal aggregation algorithms for middleware , 2001, PODS '01.

[10] Yi Lin,et al. Prediction Cubes , 2005, VLDB.

[11] Sihem Amer-Yahia,et al. Optimizing Queries on Compressed Bitmaps , 2000, VLDB.

[12] Vagelis Hristidis,et al. PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[13] John R. Smith,et al. The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[14] Luis Gravano,et al. Top-k selection queries over relational databases: Mapping strategies and performance evaluation , 2002, TODS.

[15] Yannis E. Ioannidis,et al. Bitmap index design and evaluation , 1998, SIGMOD '98.

[16] Kevin Chen-Chuan Chang,et al. RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[17] Gerhard Weikum,et al. Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[18] W. Rudin. Principles of mathematical analysis , 1964 .

[19] Ronald Fagin,et al. Fuzzy queries in multimedia database systems , 1998, PODS '98.

[20] Michael J. Carey,et al. On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[21] Walid G. Aref,et al. Rank-aware query optimization , 2004, SIGMOD '04.

[22] Yixin Chen,et al. Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[23] Yuan-Chi Chang,et al. The onion technique: indexing for linear optimization queries , 2000, SIGMOD 2000.

[24] David J. DeWitt,et al. Equi-depth multidimensional histograms , 1988, SIGMOD '88.