Ranking the big sky: efficient top-k skyline computation on massive data

In many applications, top-k skyline query is an important operation to return k skyline tuples with the highest domination scores in a potentially huge data space. It is analyzed that the existing algorithms cannot process top-k skyline query on massive data efficiently. In this paper, we propose a novel table-scan-based algorithm RSTS to compute top-k skyline results on massive data efficiently. RSTS first builds the presorted table, whose tuples are arranged in the order of round-robin retrieval on sorted column lists. RSTS consists of two phases. In phase 1, the candidate tuples are acquired by the sequential scan on the presorted table. In phase 2, RSTS calculates the domination scores of the candidates and returns query results by another sequential scan. It is proved that RSTS has the characteristic of early termination, along with the theoretical analysis of scan depths. The pruning rule for candidate tuples is devised in this paper. The theoretical pruning effect shows that majority of the skyline results can be discarded directly. The extensive experimental results, conducted on synthetic and real-life data sets, show that RSTS outperforms the existing algorithms significantly.

[1]  Ira Assent,et al.  Taking the Big Picture: representative skylines based on significance and diversity , 2014, The VLDB Journal.

[2]  Kyuseok Shim,et al.  Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce , 2013, Proc. VLDB Endow..

[3]  Xixian Han,et al.  Efficient Top-k Retrieval on Massive Data , 2015, IEEE Trans. Knowl. Data Eng..

[4]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[5]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[6]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[7]  Richard J. Lipton,et al.  Representative skylines using threshold-based preference distributions , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[8]  Xiang Lian,et al.  Probabilistic top-k dominating queries in uncertain databases , 2013, Inf. Sci..

[9]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Chiang Lee,et al.  Neural skyline filter for accelerating skyline search algorithms , 2015, Expert Syst. J. Knowl. Eng..

[11]  Man Lung Yiu,et al.  Multi-dimensional top-k dominating queries , 2009, The VLDB Journal.

[12]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[13]  Jian Pei,et al.  Secure Skyline Queries on Cloud Platform , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[14]  Seung-won Hwang,et al.  Personalized top-k skyline queries in high-dimensional space , 2009, Inf. Syst..

[15]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[16]  Yannis Manolopoulos,et al.  Progressive processing of subspace dominating queries , 2011, The VLDB Journal.

[17]  Hongbin Liu,et al.  Efficient monitoring of skyline queries over distributed data streams , 2010, Knowledge and Information Systems.

[18]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[19]  Katja Hose,et al.  A survey of skyline processing in highly distributed environments , 2011, The VLDB Journal.

[20]  Jianzhong Li,et al.  Efficient Skyline Computation on Big Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[21]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[22]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[23]  Yufei Tao,et al.  On Skylining with Flexible Dominance Relation , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Yufei Tao,et al.  On finding skylines in external memory , 2011, PODS.

[25]  Parke Godfrey,et al.  Skyline Cardinality for Relational Processing , 2004, FoIKS.

[26]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[27]  Gang Chen,et al.  Efficient algorithms for finding the most desirable skyline objects , 2015, Knowl. Based Syst..

[28]  Hua Lu,et al.  Efficient Skyline Computation in MapReduce , 2014, EDBT.

[29]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  Bin Jiang,et al.  Skyline distance: a measure of multidimensional competence , 2012, Knowledge and Information Systems.

[31]  HoseKatja,et al.  A survey of skyline processing in highly distributed environments , 2012, VLDB 2012.

[32]  Michalis Vazirgiannis,et al.  Ranking the sky: Discovering the importance of skyline points through subspace dominance relationships , 2010, Data Knowl. Eng..

[33]  Jianzhong Li,et al.  TDEP: efficiently processing top-k dominating query on massive data , 2013, Knowledge and Information Systems.

[34]  Jian Pei,et al.  Distance-Based Representative Skyline , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[35]  Jun Rao,et al.  Liquid: Unifying Nearline and Offline Big Data Integration , 2015, CIDR.

[36]  Seung-won Hwang,et al.  Scalable skyline computation using a balanced pivot selection technique , 2014, Inf. Syst..