Efficient top-k processing in large-scaled distributed environments

The rapid development of networking technologies has made it possible to construct a distributed database that involves a huge number of sites. Query processing in such a large-scaled system poses serious challenges beyond the scope of traditional distributed algorithms. In this paper, we propose a new algorithm BRANCA for performing top-k retrieval in these environments. Integrating two orthogonal methodologies ''semantic caching'' and ''routing indexes'', BRANCA is able to solve a query by accessing only a small number of servers. Our algorithmic findings are accompanied with a solid theoretical analysis, which rigorously proves the effectiveness of BRANCA. Extensive experiments verify that our technique outperforms the existing methods significantly.

[1]  D. Bartholomew,et al.  Linear Programming: Methods and Applications , 1970 .

[2]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[3]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[4]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[5]  Wolf-Tilo Balke,et al.  Progressive distributed top-k retrieval in peer-to-peer networks , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[7]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[8]  Beng Chin Ooi,et al.  BATON: A Balanced Tree Structure for Peer-to-Peer Networks , 2005, VLDB.

[9]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[10]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[11]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[12]  Luis Gravano,et al.  Top-k selection queries over relational databases: Mapping strategies and performance evaluation , 2002, TODS.

[13]  Zhe Wang,et al.  Efficient top-K query calculation in distributed networks , 2004, PODC '04.

[14]  Vijay Kumar,et al.  Semantic Caching and Query Processing , 2003, IEEE Trans. Knowl. Data Eng..

[15]  Gerhard Weikum,et al.  Probabilistic Ranking of Database Query Results , 2004, VLDB.

[16]  Hua-Gang Li,et al.  Efficient Processing of Distributed Top-k Queries , 2005, DEXA.

[17]  Gerhard Weikum,et al.  Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[18]  Steven Vajda,et al.  Linear Programming. Methods and Applications , 1964 .

[19]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[20]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS.

[21]  Divesh Srivastava,et al.  Ranked join indices , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[22]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[23]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[24]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[25]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[26]  Yuguo Chen,et al.  Efficient maintenance of materialized top-k views , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[27]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[28]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[29]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[30]  Walid G. Aref,et al.  Joining Ranked Inputs in Practice , 2002, VLDB.

[31]  Ronald Fagin,et al.  Incorporating User Preferences in Multimedia Queries , 1997, ICDT.

[32]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[33]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[34]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.