论文信息 - MapReduce Algorithm for Variants of Skyline Queries: Skyband and Dominating Queries

MapReduce Algorithm for Variants of Skyline Queries: Skyband and Dominating Queries

The skyline query and its variant queries are useful functions in the early stages of a knowledge-discovery processes. The skyline query and its variant queries select a set of important objects, which are better than other common objects in the dataset. In order to handle big data, such knowledge-discovery queries must be computed in parallel distributed environments. In this paper, we consider an efficient parallel algorithm for the “K-skyband query” and the “top-k dominating query”, which are popular variants of skyline query. We propose a method for computing both queries simultaneously in a parallel distributed framework called MapReduce, which is a popular framework for processing “big data” problems. Our extensive evaluation results validate the effectiveness and efficiency of the proposed algorithm on both real and synthetic datasets.

[1] Chen Li,et al. Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[2] Tian Xia,et al. Refreshing the sky: the compressed skycube with efficient support for frequent updates , 2006, SIGMOD Conference.

[3] Kyuseok Shim,et al. Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce , 2013, Proc. VLDB Endow..

[4] Bernhard Seeger,et al. Progressive skyline computation in database systems , 2005, TODS.

[5] Bernhard Seeger,et al. Efficient Computation of Reverse Skyline Queries , 2007, VLDB.

[6] Jignesh M. Patel,et al. A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[7] Donald Kossmann,et al. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[8] Takahiro Hara,et al. Sliding window top-k dominating query processing over distributed data streams , 2015, Distributed and Parallel Databases.

[9] Justin Zhan,et al. Finding Top- $k$ Dominance on Incomplete Big Data Using MapReduce Framework , 2018, IEEE Access.

[10] Anthony K. H. Tung,et al. DADA: a data cube for dominant relationship analysis , 2006, SIGMOD Conference.

[11] Anthony K. H. Tung,et al. Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[12] Wolf-Tilo Balke,et al. Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[13] Qing Liu,et al. Efficient Computation of the Skyline Cube , 2005, VLDB.

[14] Jan Chomicki,et al. Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15] Yufei Tao,et al. Minimal MapReduce algorithms , 2013, SIGMOD '13.

[16] Anthony K. H. Tung,et al. On High Dimensional Skylines , 2006, EDBT.

[17] Mohammad Anisuzzaman Siddique,et al. k-Dominant Skyline Query Computation in MapReduce Environment , 2015, IEICE Trans. Inf. Syst..

[18] Hongjun Lu,et al. Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[19] Bernhard Seeger,et al. An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[20] Anthony K. H. Tung,et al. MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters , 2011, IEEE Transactions on Knowledge and Data Engineering.

[21] Beng Chin Ooi,et al. Efficient Progressive Skyline Computation , 2001, VLDB.

[22] Jian Pei,et al. SUBSKY: Efficient Computation of Skylines in Subspaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23] Jarek Gryz,et al. Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[24] Jing Yuan,et al. Efficient Top-k Query Algorithms Using K-Skyband Partition , 2009, Infoscale.

[25] Xuemin Lin,et al. Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26] Man Lung Yiu,et al. Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data , 2007, VLDB.

[27] Donald Kossmann,et al. The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.