A survey of top-k query processing techniques in relational database systems

Efficient processing of top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data. In particular, efficient top-k processing in domains such as the Web, multimedia search, and distributed systems has shown a great impact on performance. In this survey, we describe and classify top-k processing techniques in relational databases. We discuss different design dimensions in the current techniques including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions. We show the implications of each dimension on the design of the underlying techniques. We also discuss top-k queries in XML domain, and show their connections to relational approaches.

[1]  Kamesh Munagala,et al.  A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Raghu Ramakrishnan,et al.  Probabilistic Optimization of Top N Queries , 1999, VLDB.

[3]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[4]  Walid G. Aref,et al.  Joining Ranked Inputs in Practice , 2002, VLDB.

[5]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[6]  M. Truchon,et al.  An Extension of the Concordet Criterion and Kemeny Orders , 1998 .

[7]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[8]  Gerhard Weikum,et al.  Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[9]  H. Young,et al.  A Consistent Extension of Condorcet’s Election Principle , 1978 .

[10]  Viswanath Poosala,et al.  Congressional Samples for Approximate Answering of Group-By Queries , 2000, SIGMOD Conference.

[11]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[12]  Seung-won Hwang,et al.  Boolean + ranking: querying a database by k-constrained optimization , 2006, SIGMOD Conference.

[13]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[14]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[15]  Seung-won Hwang,et al.  Probe Minimization by Schedule Optimization: Supporting Top-K Queries with Expensive Predicates , 2007, IEEE Transactions on Knowledge and Data Engineering.

[16]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[17]  Jiawei Han,et al.  Progressive and selective merge: computing top-k with ad-hoc ranking functions , 2007, SIGMOD '07.

[18]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[19]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[20]  Jiawei Han,et al.  Towards robust indexing for ranked queries , 2006, VLDB.

[21]  Norbert Fuhr,et al.  A Probabilistic Framework for Vague Queries and Imprecise Information in Databases , 1990, VLDB.

[22]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[23]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[24]  Surajit Chaudhuri,et al.  A robust, optimization-based approach for approximate answering of aggregate queries , 2001, SIGMOD '01.

[25]  Viswanath Poosala,et al.  Congressional samples for approximate answering of group-by queries , 2000, SIGMOD '00.

[26]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[27]  Ronald Fagin,et al.  Comparing and aggregating rankings with ties , 2004, PODS '04.

[28]  Man Lung Yiu,et al.  Efficient Aggregation of Ranked Inputs , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[29]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[30]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[31]  Jane W.-S. Liu,et al.  APPROXIMATE - A Query Processor that Produces Monotonically Improving Approximate Answers , 1993, IEEE Trans. Knowl. Data Eng..

[32]  Rajeev Motwani,et al.  Random sampling for histogram construction: how much is enough? , 1998, SIGMOD '98.

[33]  M. Kendall The treatment of ties in ranking problems. , 1945, Biometrika.

[34]  Jianping Fan,et al.  VDBMS: A testbed facility for research in video database benchmarking , 2004, Multimedia Systems.

[35]  Yuan-Chi Chang,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD 2000.

[36]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[37]  Douglas Muzzio,et al.  APPROVAL VOTING , 1983 .

[38]  Rajeev Motwani,et al.  Overcoming limitations of sampling for aggregation queries , 2001, Proceedings 17th International Conference on Data Engineering.

[39]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[40]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[41]  Kevin Chen-Chuan Chang,et al.  Supporting ad-hoc ranking aggregates , 2006, SIGMOD Conference.

[42]  L. Cranor,et al.  Declared-strategy voting: an instrument for group decision-making , 1996 .

[43]  Hicham G. Elmongui,et al.  Adaptive rank-aware query optimization in relational databases , 2006, TODS.

[44]  Mong-Li Lee,et al.  ICICLES: Self-Tuning Samples for Approximate Query Answering , 2000, VLDB.

[45]  Divesh Srivastava,et al.  Ranked join indices , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[46]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[47]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[48]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[49]  Walid G. Aref,et al.  Supporting top-kjoin queries in relational databases , 2004, The VLDB Journal.

[50]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[51]  Vagelis Hristidis,et al.  Algorithms and applications for answering ranked queries using ranked views , 2003, The VLDB Journal.

[52]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[53]  D. Black The theory of committees and elections , 1959 .

[54]  Wolf-Tilo Balke,et al.  Towards efficient multi-feature queries in heterogeneous environments , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[55]  Surajit Chaudhuri,et al.  Dynamic sample selection for approximate query processing , 2003, SIGMOD '03.

[56]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[57]  Seung-won Hwang,et al.  Optimizing top-k queries for middleware access: A unified cost-based approach , 2007, TODS.

[58]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[59]  Luis Gravano,et al.  Top-k selection queries over relational databases: Mapping strategies and performance evaluation , 2002, TODS.

[60]  K. Arrow Social Choice and Individual Values , 1951 .

[61]  Christian Klamler A comparison of the Dodgson method and the Copeland rule , 2003 .

[62]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[63]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[64]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[65]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[66]  L. A. Goodman,et al.  Social Choice and Individual Values , 1951 .

[67]  Gerhard Weikum,et al.  An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[68]  R. Graham,et al.  Spearman's Footrule as a Measure of Disarray , 1977 .

[69]  Wenfei Fan,et al.  Keys for XML , 2001, WWW '01.

[70]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[71]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[72]  Pavel Zezula,et al.  Region proximity in metric spaces and its use for approximate similarity search , 2003, TOIS.