k-Dominant Skyline Query Computation in MapReduce Environment

Filtering uninteresting data is important to utilize “big data”. Skyline query is popular technique to filter uninteresting data, in which it selects a set of objects that are not dominated by another from a given large database. However, a skyline query often retrieves too many objects to analyze intensively especially for high-dimensional dataset. To solve the problem, k-dominant skyline queries have been introduced. The size of databases sometimes become too large to compute in a centralized environment. Conventional algorithms for computing k-dominant skyline queries are not well suited for parallel and distributed environments, such as the MapReduce framework. In this paper, we consider an efficient parallel algorithm to process k-dominant skyline query in MapReduce framework. Extensive experiments demonstrate the scalability of proposed algorithm for synthetic big datasets under different settings of data distribution, dimensionality, and cardinality. key words: skyline query, k-dominant skyline query, MapReduce, big data

[1]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[2]  Jignesh M. Patel,et al.  A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[3]  Chen Li,et al.  Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[4]  Yufei Tao,et al.  On Skylining with Flexible Dominance Relation , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Lei Zou,et al.  Dynamic Skyline Queries in Large Graphs , 2010, DASFAA.

[6]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[7]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[8]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[10]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[11]  Bernhard Seeger,et al.  Efficient Computation of Reverse Skyline Queries , 2007, VLDB.

[12]  Yunhao Liu,et al.  Energy-Efficient Reverse Skyline Query Processing over Wireless Sensor Networks , 2012, IEEE Transactions on Knowledge and Data Engineering.

[13]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[14]  Shuigeng Zhou,et al.  Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments , 2011, DASFAA Workshops.

[15]  Dimitrios Gunopulos,et al.  Efficient Confident Search in Large Review Corpora , 2010, ECML/PKDD.

[16]  Jian Pei,et al.  SUBSKY: Efficient Computation of Skylines in Subspaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[18]  Anthony K. H. Tung,et al.  MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters , 2011, IEEE Transactions on Knowledge and Data Engineering.

[19]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[20]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[21]  Kyuseok Shim,et al.  Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce , 2013, Proc. VLDB Endow..

[22]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[23]  Seung-won Hwang,et al.  Navigation system for product search , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[24]  Christos Doulkeridis,et al.  SKYPEER: Efficient Subspace Skyline Computation over Distributed Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.