Best Position Algorithms for Top-k Queries 1

The general problem of answering top-k queries can be modeled using lists of data items sorted by their local sco res. The most efficient algorithm proposed so far for answering t op-k queries over sorted lists is the Threshold Algorithm (TA). However, TA may still incur a lot of useless accesses to the li sts. In this paper, we propose two new algorithms which stop much soone r. First, we propose the best position algorithm (BPA) which executes topk queries more efficiently than TA. For any databas e instance ( i.e. set of sorted lists), we prove that BPA stops as ea rly as TA, and that its execution cost is never higher than TA. We show that the position at which BPA stops can be (m-1) times lower than that of TA, where m is the number of lists. We also show that the execution cost of our algorithm can be (m-1) times lower than that of TA. Second, we propose the BPA2 algorithm which is much more efficient than BPA. We show that the number of accesses to the lists done by BPA2 can be about (m-1) times lower than that of BPA. Our performance evaluation shows that over ou test databases, BPA and BPA2 achieve significant perform ance gains in comparison with TA.

[1]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[2]  Justin Zobel,et al.  Filtered Document Retrieval with Frequency-Sorted Indexes , 1996, J. Am. Soc. Inf. Sci..

[3]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[4]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[5]  M. Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[6]  Wolf-Tilo Balke,et al.  Towards efficient multi-feature queries in heterogeneous environments , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[7]  Marco Patella,et al.  Searching in metric spaces with user-defined and approximate distances , 2002, TODS.

[8]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[10]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[11]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[12]  Torsten Suel,et al.  Optimized Query Execution in Large Search Engines with Global Page Ordering , 2003, VLDB.

[13]  Luis Gravano,et al.  Optimizing top-k selection queries over multimedia repositories , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Beng Chin Ooi,et al.  Approximate NN queries on Streams with Guaranteed Error/performance Bounds , 2004, VLDB.

[15]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[16]  Zhe Wang,et al.  Efficient top-K query calculation in distributed networks , 2004, PODC '04.

[17]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[18]  Wolf-Tilo Balke,et al.  Progressive distributed top-k retrieval in peer-to-peer networks , 2005, 21st International Conference on Data Engineering (ICDE'05).

[19]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[20]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[21]  Kamesh Munagala,et al.  A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[23]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[24]  Jianliang Xu,et al.  Monitoring Top-k Query inWireless Sensor Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  Patrick Valduriez,et al.  Reducing network traffic in unstructured P2P systems using Top-k queries , 2006, Distributed and Parallel Databases.

[26]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[27]  Patrick Valduriez,et al.  Processing Top-k Queries in Distributed Hash Tables , 2007, Euro-Par.

[28]  Patrick Valduriez,et al.  Data currency in replicated DHTs , 2007, SIGMOD '07.