Finding Pareto Optimal Groups: Group-based Skyline

Skyline computation, aiming at identifying a set of skyline points that are not dominated by any other point, is particularly useful for multi-criteria data analysis and decision making. Traditional skyline computation, however, is inadequate to answer queries that need to analyze not only individual points but also groups of points. To address this gap, we generalize the original skyline definition to the novel group-based skyline (G-Skyline), which represents Pareto optimal groups that are not dominated by other groups. In order to compute G-Skyline groups consisting of k points efficiently, we present a novel structure that represents the points in a directed skyline graph and captures the dominance relationships among the points based on the first k skyline layers. We propose efficient algorithms to compute the first k skyline layers. We then present two heuristic algorithms to efficiently compute the G-Skyline groups: the point-wise algorithm and the unit group-wise algorithm, using various pruning strategies. The experimental results on the real NBA dataset and the synthetic datasets show that G-Skyline is interesting and useful, and our algorithms are efficient and scalable.

[1]  Lei Chen,et al.  Continuous monitoring of skylines over uncertain data streams , 2012, Inf. Sci..

[2]  Ira Assent,et al.  From stars to galaxies: skyline queries on aggregate data , 2013, EDBT '13.

[3]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[4]  Bernhard Seeger,et al.  Efficient Computation of Reverse Skyline Queries , 2007, VLDB.

[5]  Hyeonseung Im,et al.  Group skyline computation , 2012, Inf. Sci..

[6]  Anthony K. H. Tung,et al.  Continuous Skyline Queries for Moving Objects , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7]  Jian Pei,et al.  Distance-Based Representative Skyline , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[8]  Kenneth L. Clarkson,et al.  Fast linear expected-time algorithms for computing maxima and convex hulls , 1993, SODA '90.

[9]  Yunhao Liu,et al.  Energy-Efficient Reverse Skyline Query Processing over Wireless Sensor Networks , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[11]  H. T. Kung,et al.  On the Average Number of Maxima in a Set of Vectors and Applications , 1978, JACM.

[12]  Xiang Lian,et al.  Reverse skyline search in uncertain databases , 2008, TODS.

[13]  Yufei Tao,et al.  On finding skylines in external memory , 2011, PODS.

[14]  Jing Yang,et al.  Efficient parallel skyline processing using hyperplane projections , 2011, SIGMOD '11.

[15]  Dan Suciu,et al.  Parallel Skyline Queries , 2012, Theory of Computing Systems.

[16]  Qing Liu,et al.  Towards multidimensional subspace skyline analysis , 2006, TODS.

[17]  Jeffrey Xu Yu,et al.  Probabilistic Skyline Operator over Sliding Windows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[18]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[19]  Gautam Das,et al.  On Skyline Groups , 2012, IEEE Transactions on Knowledge and Data Engineering.

[20]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Henrik Blunck,et al.  In-Place Algorithms for Computing (Layers of) Maxima , 2008, Algorithmica.

[22]  Cyrus Shahabi,et al.  The spatial skyline queries , 2006, VLDB.

[23]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[24]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[26]  David G. Kirkpatrick,et al.  Output-size sensitive algorithms for finding maximal vectors , 1985, SCG '85.

[27]  Divesh Srivastava,et al.  Summarizing Two-Dimensional Data with Skyline-Based Statistical Descriptors , 2008, SSDBM.

[28]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[29]  Hua Lu,et al.  Flexible and Efficient Resolution of Skyline Query Size Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[30]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[31]  Haoran Li,et al.  Finding Probabilistic k-Skyline Sets on Uncertain Data , 2015, CIKM.

[32]  Xiaofeng Xu,et al.  Faster output-sensitive skyline computation algorithm , 2014, Inf. Process. Lett..

[33]  Xiang Lian,et al.  Dynamic skyline queries in metric spaces , 2008, EDBT '08.