Parallelizing uncertain skyline computation against n‐of‐N data streaming model

The skyline query over uncertain data streams, as an important aspect of big data analysis, plays a significant role in domains such as environment monitoring, decision‐making, and data mining. The skyline query over uncertain data streams with sliding window model always focuses on the most recent N streaming items, which cannot meet the query requirements of different window scales at the same time. To improve the query flexibility and efficiency, we propose an efficient parallel method for processing uncertain n‐of‐N skyline queries; that is, computing the skyline for the most recent n (∀n ≤ N) items in parallel. Specifically, we first propose a framework for parallelizing the query computation for uncertain n‐of‐N skylines. Furthermore, we put forward a sliding window partitioning strategy as well as a streaming items mapping strategy to realize the load balance for each node. In addition, we define a spatial index structure RST based on R‐tree to organize the elements within each individual sliding window and candidate set in each which can significantly improve the dominance tests. Most importantly, we provide an encoding interval scheme to transform the n‐of‐N query into stabbing query in each compute node, which can greatly minimize the query scope and improve the query efficiency. In addition, we use a red‐black tree named RBI to store all stabbing intervals. Extensive experimental results demonstrate that the proposals are efficient and can greatly meet the query requirement of users in real applications.

[1]  Xu Zhou,et al.  Uncertain dynamic skyline queries for uncertain databases , 2015, 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[2]  Lu Chen,et al.  Probabilistic skyline queries on uncertain time series , 2016, Neurocomputing.

[3]  Pei Wang,et al.  Alternative Tuples Based Probabilistic Skyline Query Processing in Wireless Sensor Networks , 2015 .

[4]  Kaijun Ren,et al.  Efficient skyline computation over distributed interval data , 2017, Concurr. Comput. Pract. Exp..

[5]  Ken Chen,et al.  Continuous Probabilistic Skyline Queries for Uncertain Moving Objects in Road Network , 2010, 2010 2nd International Asia Conference on Informatics in Control, Automation and Robotics (CAR 2010).

[6]  Kyuseok Shim,et al.  Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce , 2013, Proc. VLDB Endow..

[7]  Xiao Qin,et al.  Efficient Parallel Skyline Evaluation Using MapReduce , 2016, IEEE Transactions on Parallel and Distributed Systems.

[8]  Tiziano De Matteis,et al.  Continuous skyline queries on multicore architectures , 2016, Concurr. Comput. Pract. Exp..

[9]  Yongge Wang,et al.  Asymptotic-Efficient Algorithms for Skyline Query Processing over Uncertain Contexts , 2015, IDEAS.

[10]  Tiziano De Matteis,et al.  A Multicore Parallelization of Continuous Skyline Queries on Data Streams , 2015, Euro-Par.

[11]  Xiaoling Li,et al.  Parallelizing Probabilistic Streaming Skyline Operator in Cloud Computing Environments , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[12]  Yi-Jie Wang,et al.  Efficient Probabilistic Skyline Computation Against n -of- N Data Stream Model: Efficient Probabilistic Skyline Computation Against n -of- N Data Stream Model , 2012 .

[13]  Zhen He,et al.  Answering skyline queries on probabilistic data using the dominance of probabilistic skyline tuples , 2016, Inf. Sci..

[14]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[15]  Kyuseok Shim,et al.  Efficient Processing of Skyline Queries Using MapReduce , 2017, IEEE Transactions on Knowledge and Data Engineering.

[16]  Hamidah Ibrahim,et al.  A Framework for Evaluating Skyline Query Over Uncertain Autonomous Databases , 2014, ICCS.

[17]  Henri Prade,et al.  Skyline Queries in an Uncertain Database Model Based on Possibilistic Certainty , 2014, SUM.

[18]  Werner Kießling,et al.  The Preference SQL System - An Overview , 2011, IEEE Data Eng. Bull..

[19]  Werner Kießling,et al.  Scalagon: An Efficient Skyline Algorithm for All Seasons , 2015, DASFAA.

[20]  Werner Kießling,et al.  Parallel Skyline Computation Exploiting the Lattice Structure , 2015, J. Database Manag..

[21]  Michael Zink,et al.  Capturing Data Uncertainty in High-Volume Stream Processing , 2009, CIDR.

[22]  Chuan-Ming Liu,et al.  An Effective Probabilistic Skyline Query Process on Uncertain Data Streams , 2015, EUSPN/ICTH.

[23]  Jonghyun Park,et al.  Parallel Skyline Computation on Multicore Architectures , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[24]  Dan Suciu,et al.  Parallel Skyline Queries , 2012, ICDT '12.

[25]  Shuigeng Zhou,et al.  Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments , 2011, DASFAA Workshops.

[26]  Xiaoling Li,et al.  Parallel skyline queries over uncertain data streams in cloud computing environments , 2014, Int. J. Web Grid Serv..

[27]  Werner Kießling,et al.  Foundations of Preferences in Database Systems , 2002, VLDB.

[28]  Jian Pei,et al.  Towards Progressive and Load Balancing Distributed Computation: A Case Study on Skyline Analysis , 2010, Journal of Computer Science and Technology.

[29]  Ira Assent,et al.  SkyAlign: a portable, work-efficient skyline algorithm for multicore and GPU architectures , 2016, The VLDB Journal.

[30]  Leonidas Fegaras,et al.  Incremental Query Processing on Big Data Streams , 2015, IEEE Transactions on Knowledge and Data Engineering.

[31]  Xiaoling Li,et al.  A survey of queries over uncertain data , 2013, Knowledge and Information Systems.

[32]  Lijun Chang,et al.  Probabilistic n-of-N skyline computation over uncertain data streams , 2013, World Wide Web.

[33]  Jing Yang,et al.  Efficient parallel skyline processing using hyperplane projections , 2011, SIGMOD '11.

[34]  Ira Assent,et al.  Work-Efficient Parallel Skyline Computation for the GPU , 2015, Proc. VLDB Endow..

[35]  Jeffrey Xu Yu,et al.  Probabilistic Skyline Operator over Sliding Windows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[36]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[37]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[38]  Chao-Wen Yang,et al.  Efficient Computation of Group Skyline Queries on MapReduce , 2016 .

[39]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[40]  Wei-Mei Chen,et al.  Parallel Skyline Queries on Multi-core Systems , 2013, 2013 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[41]  Christos Doulkeridis,et al.  Angle-based space partitioning for efficient parallel skyline computation , 2008, SIGMOD Conference.

[42]  Xiaoling Li,et al.  Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index , 2014, Knowledge and Information Systems.

[43]  Lei Chen,et al.  Continuous monitoring of skylines over uncertain data streams , 2012, Inf. Sci..

[44]  Xu Zhou,et al.  Adaptive Processing for Distributed Skyline Queries over Uncertain Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[45]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[46]  Yang Yong Efficient Probabilistic Skyline Computation Against n-of-N Data Stream Model , 2012 .

[47]  Davide Martinenghi,et al.  Reconciling Skyline and Ranking Queries , 2017, Proc. VLDB Endow..