Group-Based Skyline for Pareto Optimal Groups

Skyline computation, aiming at identifying a set of skyline points that are not dominated by any other point, is particularly useful for multi-criteria data analysis and decision making. Traditional skyline computation, however, is inadequate to answer queries that need to analyze not only <italic>individual</italic> points but also <italic>groups</italic> of points. To address this gap, we generalize the original skyline definition to the novel group-based skyline (G-Skyline), which represents Pareto optimal groups that are not dominated by other groups. In order to compute G-Skyline groups consisting of <inline-formula><tex-math notation="LaTeX">$s$</tex-math><alternatives><mml:math><mml:mi>s</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq1-2960347.gif"/></alternatives></inline-formula> points efficiently, we present a novel structure that represents the points in a directed skyline graph and captures the dominance relationships among the points based on the first <inline-formula><tex-math notation="LaTeX">$s$</tex-math><alternatives><mml:math><mml:mi>s</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq2-2960347.gif"/></alternatives></inline-formula> skyline layers. We propose efficient algorithms to compute the first <inline-formula><tex-math notation="LaTeX">$s$</tex-math><alternatives><mml:math><mml:mi>s</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq3-2960347.gif"/></alternatives></inline-formula> skyline layers. We then present two heuristic algorithms to efficiently compute the G-Skyline groups: the point-wise algorithm and the unit group-wise algorithm, using various pruning strategies. We observe that the number of G-Skyline groups of a dataset can be significantly large, we further propose the top-<inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq4-2960347.gif"/></alternatives></inline-formula> representative G-Skyline groups based on the number of dominated points and the number of dominated groups and present efficient algorithms for computing them. The experimental results on the real NBA dataset and the synthetic datasets show that G-Skyline is interesting and useful, and our algorithms are efficient and scalable.

[1]  Xiang Lian,et al.  Dynamic skyline queries in metric spaces , 2008, EDBT '08.

[2]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[3]  David G. Kirkpatrick,et al.  Output-size sensitive algorithms for finding maximal vectors , 1985, SCG '85.

[4]  Gautam Das,et al.  On Skyline Groups , 2012, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ira Assent,et al.  From stars to galaxies: skyline queries on aggregate data , 2013, EDBT '13.

[6]  H. T. Kung,et al.  On the Average Number of Maxima in a Set of Vectors and Applications , 1978, JACM.

[7]  Hyeonseung Im,et al.  Group skyline computation , 2012, Inf. Sci..

[8]  Haoran Li,et al.  Finding Probabilistic k-Skyline Sets on Uncertain Data , 2015, CIKM.

[9]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Ira Assent,et al.  Taking the Big Picture: representative skylines based on significance and diversity , 2014, The VLDB Journal.

[11]  Jian Pei,et al.  Secure and Efficient Skyline Queries on Encrypted Data , 2018, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jian Pei,et al.  Distance-Based Representative Skyline , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[13]  Christos Doulkeridis,et al.  Discovering Representative Skyline Points over Distributed Data , 2012, SSDBM.

[14]  Jian Pei,et al.  Secure Skyline Queries on Cloud Platform , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[15]  Jian Pei,et al.  Skyline Diagram: Finding the Voronoi Counterpart for Skyline Queries , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[16]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[17]  Richard J. Lipton,et al.  Representative skylines using threshold-based preference distributions , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[18]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[19]  Jeffrey Xu Yu,et al.  Probabilistic Skyline Operator over Sliding Windows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  GodfreyParke,et al.  Algorithms and analyses for maximal vector computation , 2007, VLDB 2007.

[21]  Jian Pei,et al.  Finding Pareto Optimal Groups: Group-based Skyline , 2015, Proc. VLDB Endow..

[22]  Bernhard Seeger,et al.  Efficient Computation of Reverse Skyline Queries , 2007, VLDB.

[23]  Xindong Wu,et al.  Discovering the k Representative Skyline Over a Sliding Window , 2016, IEEE Trans. Knowl. Data Eng..

[24]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Jianxin Li,et al.  Efficient distance-based representative skyline computation in 2D space , 2017, World Wide Web.

[26]  Xu Chen,et al.  Fast Algorithms for Pareto Optimal Group-based Skyline , 2017, CIKM.

[27]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[28]  Qiang Liu,et al.  Top-k Skyline Groups Queries , 2017, EDBT.

[29]  Xiaofeng Xu,et al.  Faster output-sensitive skyline computation algorithm , 2014, Inf. Process. Lett..

[30]  Kenneth L. Clarkson,et al.  Fast linear expected-time algorithms for computing maxima and convex hulls , 1993, SODA '90.

[31]  Cyrus Shahabi,et al.  The spatial skyline queries , 2006, VLDB.

[32]  Ira Assent,et al.  Maximum Coverage Representative Skyline , 2016, EDBT.

[33]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[34]  Philip S. Yu,et al.  Efficient Computation of G-Skyline Groups , 2018, IEEE Transactions on Knowledge and Data Engineering.

[35]  Yunhao Liu,et al.  Energy-Efficient Reverse Skyline Query Processing over Wireless Sensor Networks , 2012, IEEE Transactions on Knowledge and Data Engineering.

[36]  Jing Yang,et al.  Efficient parallel skyline processing using hyperplane projections , 2011, SIGMOD '11.

[37]  Jeffrey Xu Yu,et al.  Efficient Algorithms for Distance-Based Representative Skyline Computation in 2D Space , 2015, APWeb.

[38]  Jinli Cao,et al.  Preference-Based Top-k Representative Skyline Queries on Uncertain Databases , 2015, PAKDD.

[39]  Anthony K. H. Tung,et al.  Continuous Skyline Queries for Moving Objects , 2006, IEEE Transactions on Knowledge and Data Engineering.