LShape Partitioning: Parallel Skyline Query Processing Using $MapReduce$MapReduce

— A skyline query searches the data points that are not dominated by others in the dataset. It is widely adopted for many applications which require multi-criteria decision making. However, skyline query processing is considerably time-consuming for a high-dimensional large scale dataset. Parallel computing techniques are therefore needed to address this challenge, among which MapReduce is one of the most popular frameworks to process big data. A great number of efficient MapReduce skyline algorithms have been proposed in the literature and most of their designs focus on partitioning and pruning the given dataset. However, there are still opportunities for further parallelism. In this study, we propose two parallel skyline processing algorithms using a novel LShape partitioning strategy and an effective Propagation Filtering method. These two algorithms are 2 Phase LShape and 1 Phase LShape , used for multiple reducers and single reducer, respec-tively. By extensive experiments, we verify that our algorithms outper-formed the state-of-the-art approaches, especially for high-dimensional large scale datasets.

[1]  Xunfei Jiang,et al.  An MBR-Oriented Approach for Efficient Skyline Query Processing , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[2]  Walid G. Aref,et al.  Efficient Parallel Skyline Query Processing for High-Dimensional Data , 2018, IEEE Transactions on Knowledge and Data Engineering.

[3]  Kyuseok Shim,et al.  Efficient Processing of Skyline Queries Using MapReduce , 2017, IEEE Transactions on Knowledge and Data Engineering.

[4]  Xiao Qin,et al.  Efficient Parallel Skyline Evaluation Using MapReduce , 2016, IEEE Transactions on Parallel and Distributed Systems.

[5]  Pierre Baldi,et al.  Parameterized neural networks for high-energy physics , 2016, The European Physical Journal C.

[6]  Kenli Li,et al.  Reporting L Most Favorite Objects in Uncertain Databases with Probabilistic Reverse Top-k Queries , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[7]  Mahmoud Parsian,et al.  Data Algorithms: Recipes for Scaling Up with Hadoop and Spark , 2015 .

[8]  Kyuseok Shim,et al.  Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce , 2013, Proc. VLDB Endow..

[9]  Arbee L. P. Chen,et al.  Determining k-most demanding products with maximum expected number of total customers , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  Yunhao Liu,et al.  Energy-Efficient Reverse Skyline Query Processing over Wireless Sensor Networks , 2012, IEEE Transactions on Knowledge and Data Engineering.

[11]  Liang Chen,et al.  MapReduce Skyline Query Processing with a New Angular Partitioning Approach , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[12]  Shuigeng Zhou,et al.  Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments , 2011, DASFAA Workshops.

[13]  Christos Doulkeridis,et al.  Angle-based space partitioning for efficient parallel skyline computation , 2008, SIGMOD Conference.

[14]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[15]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[16]  Bernhard Seeger,et al.  Efficient Computation of Reverse Skyline Queries , 2007, VLDB.

[17]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Cyrus Shahabi,et al.  The spatial skyline queries , 2006, VLDB.

[19]  Yufei Tao,et al.  Maintaining sliding window skylines on data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[20]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[21]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[22]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[23]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[24]  Ji Zhang,et al.  Efficient Parallel Spatial Skyline Evaluation Using MapReduce , 2017, EDBT.

[25]  Arbee L. P. Chen,et al.  MapReduce skyline query processing with partitioning and distributed dominance tests , 2017, Inf. Sci..

[26]  Hua Lu,et al.  Efficient Skyline Computation in MapReduce , 2014, EDBT.