Efficient and generic evaluation of ranked queries

An important feature of the existing methods for ranked top-k processing is to avoid searching all the objects in the underlying dataset, and limiting the number of random accesses to the data. However, the performance of these methods degrades rapidly as the number of random accesses increases. In this paper, we propose a novel and general sequential access scheme for top-k query evaluation, which outperforms existing methods. We extend this scheme to efficiently answer top-k queries in subspace and on dynamic data. We also study the "dual" form of top-k queries called "ranking" queries, which returns the rank of a specified record/object, and propose an exact as well as two approximate solutions. An extensive empirical evaluation validates the robustness and efficiency of our techniques.

[1]  Sudipto Guha,et al.  Ad-hoc aggregations of ranked lists in the presence of hierarchies , 2008, SIGMOD Conference.

[2]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[3]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[4]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[5]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[6]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[7]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[8]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[9]  Xuhua Ding,et al.  Efficient processing of exact top-k queries over disk-resident sorted lists , 2010, The VLDB Journal.

[10]  Peter Vojtás,et al.  On Top-kSearch with No Random Access Using Small Memory , 2008, ADBIS.

[11]  Il-Yeol Song,et al.  The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique , 2009, Inf. Sci..

[12]  Yufei Tao,et al.  Branch-and-bound processing of ranked queries , 2007, Inf. Syst..

[13]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[14]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[15]  Walid G. Aref,et al.  Rank-aware query processsing and optimization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[17]  Vagelis Hristidis,et al.  Algorithms and applications for answering ranked queries using ranked views , 2003, The VLDB Journal.

[18]  John R. Smith,et al.  Making the threshold algorithm access cost aware , 2004, IEEE Transactions on Knowledge and Data Engineering.

[19]  Wolf-Tilo Balke,et al.  Towards efficient multi-feature queries in heterogeneous environments , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[20]  Seung-won Hwang,et al.  Optimizing top-k queries for middleware access: A unified cost-based approach , 2007, TODS.

[21]  Wendy Hui Wang,et al.  The Threshold Algorithm: From Middleware Systems to the Relational Engine , 2007, IEEE Transactions on Knowledge and Data Engineering.

[22]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[23]  Man Lung Yiu,et al.  Efficient Aggregation of Ranked Inputs , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[24]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25]  Jiawei Han,et al.  Towards robust indexing for ranked queries , 2006, VLDB.

[26]  Patrick Valduriez,et al.  Best Position Algorithms for Top-k Queries , 2007, VLDB.

[27]  Lei Zou,et al.  Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries , 2008, IEEE Transactions on Knowledge and Data Engineering.