MapReduce Algorithm for Variants of Skyline Queries: Skyband and Dominating Queries

The skyline query and its variant queries are useful functions in the early stages of a knowledge-discovery processes. The skyline query and its variant queries select a set of important objects, which are better than other common objects in the dataset. In order to handle big data, such knowledge-discovery queries must be computed in parallel distributed environments. In this paper, we consider an efficient parallel algorithm for the “K-skyband query” and the “top-k dominating query”, which are popular variants of skyline query. We propose a method for computing both queries simultaneously in a parallel distributed framework called MapReduce, which is a popular framework for processing “big data” problems. Our extensive evaluation results validate the effectiveness and efficiency of the proposed algorithm on both real and synthetic datasets.

[1]  Chen Li,et al.  Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[2]  Tian Xia,et al.  Refreshing the sky: the compressed skycube with efficient support for frequent updates , 2006, SIGMOD Conference.

[3]  Kyuseok Shim,et al.  Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce , 2013, Proc. VLDB Endow..

[4]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[5]  Bernhard Seeger,et al.  Efficient Computation of Reverse Skyline Queries , 2007, VLDB.

[6]  Jignesh M. Patel,et al.  A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[7]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[8]  Takahiro Hara,et al.  Sliding window top-k dominating query processing over distributed data streams , 2015, Distributed and Parallel Databases.

[9]  Justin Zhan,et al.  Finding Top- $k$ Dominance on Incomplete Big Data Using MapReduce Framework , 2018, IEEE Access.

[10]  Anthony K. H. Tung,et al.  DADA: a data cube for dominant relationship analysis , 2006, SIGMOD Conference.

[11]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[12]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[13]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[14]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  Yufei Tao,et al.  Minimal MapReduce algorithms , 2013, SIGMOD '13.

[16]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[17]  Mohammad Anisuzzaman Siddique,et al.  k-Dominant Skyline Query Computation in MapReduce Environment , 2015, IEICE Trans. Inf. Syst..

[18]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[19]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[20]  Anthony K. H. Tung,et al.  MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters , 2011, IEEE Transactions on Knowledge and Data Engineering.

[21]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[22]  Jian Pei,et al.  SUBSKY: Efficient Computation of Skylines in Subspaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[24]  Jing Yuan,et al.  Efficient Top-k Query Algorithms Using K-Skyband Partition , 2009, Infoscale.

[25]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Man Lung Yiu,et al.  Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data , 2007, VLDB.

[27]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.