Supporting efficient distributed skyline computation using skyline views

Skyline queries return a set of objects, or a skyline, that are not dominated by any other objects. While providing users with an intuitive query formulation, the skyline queries may incur too many results, especially, for high dimensional data. To tackle this problem, subspace skyline queries, which deals with a subset of dimensions, have been recently studied. To identify interesting skylines, users can iteratively refine multiple relevant subspaces for skyline queries. Existing work focuses primarily on supporting efficient subspace skyline computation in centralized databases. In clear contrast, this paper aims to address subspace skyline computation in distributed environments such as the Web. Toward this goal, we make use of pre-computed subspace skylines as views in databases, called skyline views. Specifically, we propose distributed subspace skyline computation which minimizes the total access cost by leveraging the skyline views. Our experimental results validate that our proposed algorithms significantly outperform state-of-the-art algorithms in extensive synthetic datasets.

[1]  Alfredo Cuzzocrea,et al.  Analytical Synopses for Approximate Query Answering in OLAP Environments , 2004, DEXA.

[2]  Il-Yeol Song,et al.  The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique , 2009, Inf. Sci..

[3]  Beng Chin Ooi,et al.  BATON: A Balanced Tree Structure for Peer-to-Peer Networks , 2005, VLDB.

[4]  Seung-won Hwang,et al.  Supporting personalized ranking over categorical attributes , 2008, Inf. Sci..

[5]  Tian Xia,et al.  Refreshing the sky: the compressed skycube with efficient support for frequent updates , 2006, SIGMOD Conference.

[6]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[7]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[8]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[9]  Seung-won Hwang,et al.  Search structures and algorithms for personalized ranking , 2008, Inf. Sci..

[10]  Patrick Valduriez,et al.  Best Position Algorithms for Top-k Queries , 2007, VLDB.

[11]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[12]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[13]  Jian Pei,et al.  SUBSKY: Efficient Computation of Skylines in Subspaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[14]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[15]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[16]  Anthony K. H. Tung,et al.  Efficient Skyline Query Processing on Peer-to-Peer Networks , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  David Wai-Lok Cheung,et al.  Progressive skylining over Web-accessible databases , 2006, Data Knowl. Eng..

[18]  Jaewoo Kang,et al.  Efficient skycube computation using point and domain-based filtering , 2010, Inf. Sci..

[19]  Katja Hose,et al.  Processing relaxed skylines in PDMS using distributed data summaries , 2006, CIKM '06.

[20]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[21]  Seung-won Hwang,et al.  Optimizing access cost for top-k queries over Web sources: a unified cost-based approach , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Christos Doulkeridis,et al.  SKYPEER: Efficient Subspace Skyline Computation over Distributed Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[23]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[24]  Alfredo Cuzzocrea Accuracy Control in Compressed Multidimensional Data Cubes for Quality of Answer-based OLAP Tools , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[25]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[26]  Seung-won Hwang,et al.  Skyline View: Efficient Distributed Subspace Skyline Computation , 2009, DaWaK.