论文信息 - Being picky: processing top-k queries with set-defined selections - 字舞流文

Being picky: processing top-k queries with set-defined selections

Focusing on the top-K items according to a ranking criterion constitutes an important functionality in many different query answering scenarios. The idea is to read only the necessary information---mostly from secondary storage---with the ultimate goal to achieve low latency. In this work, we consider processing such top-K queries under the constraint that the result items are members of a specific set, which is provided at query time. We call this restriction a set-defined selection criterion. Set-defined selections drastically influence the pros and cons of an id-ordered index vs. a score-ordered index. We present a mathematical model that allows to decide at runtime which index to choose, leading to a combined index. To improve the latency around the break even point of the two indices, we show how to benefit from a partitioned score-ordered index and present an algorithm to create such partitions based on analyzing query logs. Further performance gains can be enjoyed using approximate top-K results, with tunable result quality. The presented approaches are evaluated using both real-world and synthetic data.

Sebastian Michel | Aleksandar Stupar | S. Michel | A. Stupar

[1] Ronald Fagin,et al. Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[2] Gerhard Weikum,et al. Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[3] Jennifer Widom,et al. Database Systems: The Complete Book , 2001 .

[4] Kevin Chen-Chuan Chang,et al. RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[5] Jiawei Han,et al. Answering top-k queries with multi-dimensional selections: the ranking cube approach , 2006, VLDB.

[6] Surajit Chaudhuri,et al. An overview of data warehousing and OLAP technology , 1997, SGMD.

[7] Òscar Celma,et al. Music Recommendation and Discovery - The Long Tail, Long Fail, and Long Play in the Digital Music Space , 2010 .

[8] Òscar Celma,et al. Music recommendation and discovery in the long tail , 2008 .

[9] Zhe Wang,et al. Efficient top-K query calculation in distributed networks , 2004, PODC '04.

[10] Philippe Flajolet,et al. Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[11] Luca Trevisan,et al. Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[12] S. Muthukrishnan,et al. Efficient algorithms for document retrieval problems , 2002, SODA '02.

[13] Vivek R. Narasayya,et al. Automatic workload driven index defragmentation , 2011, Proc. VLDB Endow..

[14] R. Varshney,et al. Supporting top-k join queries in relational databases , 2011 .

[15] Alan M. Frieze,et al. Min-wise independent permutations (extended abstract) , 1998, STOC '98.

[16] Jiawei Han,et al. P-Cube: Answering Preference Queries in Multi-Dimensional Space , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[17] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[18] Alistair Moffat,et al. Vector-space ranking with effective early termination , 2001, SIGIR '01.

[19] Thierry Bertin-Mahieux,et al. The Million Song Dataset , 2011, ISMIR.

[20] Sebastian Michel,et al. Picasso - to sing, you must close your eyes and draw , 2011, SIGIR '11.

[21] Luis Gravano,et al. Top-k selection queries over relational databases: Mapping strategies and performance evaluation , 2002, TODS.

[22] Torsten Suel,et al. Faster top-k document retrieval using block-max indexes , 2011, SIGIR.

[23] Gerhard Weikum,et al. KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[24] Marek Karpinski,et al. Top-K color queries for document retrieval , 2011, SODA '11.

[25] Ingmar Weber,et al. The CompleteSearch Engine: Interactive, Efficient, and Towards IR& DB Integration , 2007, CIDR.

[26] Ihab F. Ilyas,et al. A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[27] Moni Naor,et al. Optimal aggregation algorithms for middleware , 2001, PODS '01.

[28] Xiaodan Wang,et al. A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching , 2007, DASFAA.

[29] Ingmar Weber,et al. Type less, find more: fast autocompletion search with a succinct index , 2006, SIGIR.

[30] Carlo Curino,et al. Schism , 2010, Proc. VLDB Endow..

[31] Divesh Srivastava,et al. Processing top-k join queries , 2010, Proc. VLDB Endow..

[32] Martin L. Kersten,et al. Database Cracking , 2007, CIDR.

[33] Jens Lehmann,et al. DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[34] Mudhakar Srivatsa,et al. Efficient and Secure Search of Enterprise File Systems , 2007, IEEE International Conference on Web Services (ICWS 2007).