On efficient top-k query processing in highly distributed environments

Lately the advances in centralized database management systems show a trend towards supporting rank-aware query operators, like top-k, that enable users to retrieve only the most interesting data objects. A challenging problem is to support rank-aware queries in highly distributed environments. In this paper, we present a novel approach, called SPEERTO, for top-k query processing in large-scale peer-to-peer networks, where the dataset is horizontally distributed over the peers. Towards this goal, we explore the applicability of the skyline operator for efficiently routing top-k queries in a large super-peer network. Relying on a thresholding scheme, SPEERTO returns the exact results progressively to the user, while the number of queried super-peers and transferred data is minimized. Finally, we propose different variations of SPEERTO that allow balancing between transferred data volume and response time. Through simulations we demonstrate the feasibility of our approach.

[1]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[2]  Werner Kießling,et al.  Optimizing Multi-Feature Queries for Image Databases , 2000, VLDB.

[3]  M. Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[4]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[5]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[6]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Divesh Srivastava,et al.  Ranked join indices , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[8]  Jie Lu,et al.  Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks , 2005, Workshop on Peer-to-Peer Information Retrieval.

[9]  Luis Gravano,et al.  Optimizing top-k selection queries over multimedia repositories , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Wolf-Tilo Balke,et al.  Multi-objective Query Processing for Database Systems , 2004, VLDB.

[11]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[12]  Jie Lu,et al.  Merging retrieval results in hierarchical peer-to-peer networks , 2004, SIGIR '04.

[13]  Zhe Wang,et al.  Efficient top-K query calculation in distributed networks , 2004, PODC '04.

[14]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[15]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[16]  Katja Hose,et al.  Processing Top-N Queries in P2P-based Web Integration Systems with Probabilistic Guarantees , 2005, WebDB.

[17]  Wolf-Tilo Balke,et al.  Progressive distributed top-k retrieval in peer-to-peer networks , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Surajit Chaudhuri,et al.  Robust Cardinality and Cost Estimation for Skyline Operator , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Patrick Valduriez,et al.  Reducing network traffic in unstructured P2P systems using Top-k queries , 2006, Distributed and Parallel Databases.

[20]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[21]  Patrick Valduriez,et al.  Best Position Algorithms for Top-k Queries , 2007, VLDB.

[22]  Yufei Tao,et al.  Efficient top-k processing in large-scaled distributed environments , 2007, Data Knowl. Eng..