Efficient Routing of Subspace Skyline Queries over Highly Distributed Data

Data generation increases at highly dynamic rates, making its storage, processing, and update costs at one central location excessive. The P2P paradigm emerges as a powerful model for organizing and searching large data repositories distributed over independent sources. Advanced query operators, such as skyline queries, are necessary in order to help users handle the huge amount of available data. A skyline query retrieves the set of nondominated data points in a multidimensional data set. Skyline query processing in P2P networks poses inherent challenges and demands nontraditional techniques, due to the distribution of content and the lack of global knowledge. Relying on a superpeer architecture, we propose a threshold-based algorithm, called SKYPEER and its variants, for efficient computation of skyline points in arbitrary subspaces, while reducing both computational time and volume of transmitted data. Furthermore, we address the problem of routing skyline queries over the superpeer network and we propose an efficient routing mechanism, namely SKYPEER+, which further improves the performance by reducing the number of contacted superpeers. Finally, we provide an extensive experimental evaluation showing that our approach performs efficiently and provides a viable solution when a large degree of distribution is required.

[1]  Ilaria Bartolini,et al.  SaLSa: computing the skyline without scanning the whole sky , 2006, CIKM '06.

[2]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[3]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Evaggelia Pitoura,et al.  BITPEER: continuous subspace skyline computation with distributed bitmap indexes , 2008, DaMaP '08.

[5]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[6]  Christos Doulkeridis,et al.  DESENT: decentralized and distributed semantic overlay generation in P2P networks , 2007, IEEE Journal on Selected Areas in Communications.

[7]  Anthony K. H. Tung,et al.  Skyframe: a framework for skyline query processing in peer-to-peer systems , 2008, The VLDB Journal.

[8]  Shuigeng Zhou,et al.  Efficient Skyline Retrieval on Peer-to-Peer Networks , 2007, Future Generation Communication and Networking (FGCN 2007).

[9]  Yufei Tao,et al.  Distributed Skyline Retrieval with Low Bandwidth Consumption , 2009, IEEE Transactions on Knowledge and Data Engineering.

[10]  Beng Chin Ooi,et al.  Skyline Queries Against Mobile Lightweight Devices in MANETs , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Ben Y. Zhao,et al.  Parallelizing Skyline Queries for Scalable Distribution , 2006, EDBT.

[12]  Jian Pei,et al.  SUBSKY: Efficient Computation of Skylines in Subspaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Anthony K. H. Tung,et al.  Efficient Skyline Query Processing on Peer-to-Peer Networks , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[15]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[16]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[17]  Christos Doulkeridis,et al.  Peer-to-Peer Similarity Search in Metric Spaces , 2007, VLDB.

[18]  Katja Hose,et al.  Processing relaxed skylines in PDMS using distributed data summaries , 2006, CIKM '06.

[19]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[20]  Christos Doulkeridis,et al.  SKYPEER: Efficient Subspace Skyline Computation over Distributed Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[22]  Hua Lu,et al.  Parallel Distributed Processing of Constrained Skyline Queries by Filtering , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[25]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[26]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.