论文信息 - k-Dominant Skyline Query Computation in MapReduce Environment

k-Dominant Skyline Query Computation in MapReduce Environment

Filtering uninteresting data is important to utilize “big data”. Skyline query is popular technique to filter uninteresting data, in which it selects a set of objects that are not dominated by another from a given large database. However, a skyline query often retrieves too many objects to analyze intensively especially for high-dimensional dataset. To solve the problem, k-dominant skyline queries have been introduced. The size of databases sometimes become too large to compute in a centralized environment. Conventional algorithms for computing k-dominant skyline queries are not well suited for parallel and distributed environments, such as the MapReduce framework. In this paper, we consider an efficient parallel algorithm to process k-dominant skyline query in MapReduce framework. Extensive experiments demonstrate the scalability of proposed algorithm for synthetic big datasets under different settings of data distribution, dimensionality, and cardinality. key words: skyline query, k-dominant skyline query, MapReduce, big data

Mohammad Anisuzzaman Siddique | Yasuhiko Morimoto | Hao Tian

[1] Bernhard Seeger,et al. Progressive skyline computation in database systems , 2005, TODS.

[2] Jignesh M. Patel,et al. A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[3] Chen Li,et al. Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[4] Yufei Tao,et al. On Skylining with Flexible Dominance Relation , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5] Lei Zou,et al. Dynamic Skyline Queries in Large Graphs , 2010, DASFAA.

[6] Donald Kossmann,et al. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[7] Anthony K. H. Tung,et al. Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[8] Hongjun Lu,et al. Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9] Moni Naor,et al. Optimal aggregation algorithms for middleware , 2001, PODS.

[10] Wolf-Tilo Balke,et al. Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[11] Bernhard Seeger,et al. Efficient Computation of Reverse Skyline Queries , 2007, VLDB.

[12] Yunhao Liu,et al. Energy-Efficient Reverse Skyline Query Processing over Wireless Sensor Networks , 2012, IEEE Transactions on Knowledge and Data Engineering.

[13] Ken C. K. Lee,et al. Approaching the Skyline in Z Order , 2007, VLDB.

[14] Shuigeng Zhou,et al. Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments , 2011, DASFAA Workshops.

[15] Dimitrios Gunopulos,et al. Efficient Confident Search in Large Review Corpora , 2010, ECML/PKDD.

[16] Jian Pei,et al. SUBSKY: Efficient Computation of Skylines in Subspaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17] Bernhard Seeger,et al. An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[18] Anthony K. H. Tung,et al. MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters , 2011, IEEE Transactions on Knowledge and Data Engineering.

[19] Beng Chin Ooi,et al. Efficient Progressive Skyline Computation , 2001, VLDB.

[20] Jan Chomicki,et al. Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[21] Kyuseok Shim,et al. Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce , 2013, Proc. VLDB Endow..

[22] Donald Kossmann,et al. The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[23] Seung-won Hwang,et al. Navigation system for product search , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[24] Christos Doulkeridis,et al. SKYPEER: Efficient Subspace Skyline Computation over Distributed Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.