Answering top-k queries with multi-dimensional selections: the ranking cube approach

Observed in many real applications, a top-k query often consists of two components to reflect a user's preference: a selection condition and a ranking function. A user may not only propose ad hoc ranking functions, but also use different interesting subsets of the data. In many cases, a user may want to have a thorough study of the data by initiating a multi-dimensional analysis of the top-k query results. Previous work on top-k query processing mainly focuses on optimizing data access according to the ranking function only. The problem of efficient answering top-k queries with multi-dimensional selections has not been well addressed yet.This paper proposes a new computational model, called ranking cube, for efficient answering top-k queries with multi-dimensional selections. We define a rank-aware measure for the cube, capturing our goal of responding to multi-dimensional ranking analysis. Based on the ranking cube, an efficient query algorithm is developed which progressively retrieves data blocks until the top-k results are found. The curse of dimensionality is a well-known challenge for the data cube and we cope with this difficulty by introducing a new technique of ranking fragments. Our experiments on Microsoft's SQL Server 2005 show that our proposed approaches have significant improvement over the previous methods.

[1]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[2]  Jeffrey Scott Vitter,et al.  Efficient searching with linear constraints , 1998, J. Comput. Syst. Sci..

[3]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[4]  Jonathan Goldstein,et al.  Processing queries by linear constraints , 1997, PODS '97.

[5]  Jiawei Han,et al.  High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.

[6]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[7]  Christian Böhm,et al.  Optimal Multidimensional Query Processing Using Tree Striping , 2000, DaWaK.

[8]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[9]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[10]  Yi Lin,et al.  Prediction Cubes , 2005, VLDB.

[11]  Sihem Amer-Yahia,et al.  Optimizing Queries on Compressed Bitmaps , 2000, VLDB.

[12]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[13]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[14]  Luis Gravano,et al.  Top-k selection queries over relational databases: Mapping strategies and performance evaluation , 2002, TODS.

[15]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[16]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[17]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[18]  W. Rudin Principles of mathematical analysis , 1964 .

[19]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS '98.

[20]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[21]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[22]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[23]  Yuan-Chi Chang,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD 2000.

[24]  David J. DeWitt,et al.  Equi-depth multidimensional histograms , 1988, SIGMOD '88.