Top k probabilistic skyline queries on uncertain data

Abstract Uncertainty of data is inherent in many applications, and query processing over uncertain data has gained widespread attention. The probabilistic skyline query is a powerful tool for managing uncertain data. However, the famous probabilistic skyline query, called p-skyline query, is likely to return unattractive objects which have no advantage in either their attributes or skyline probabilities with comparing to other query results. Moreover, it may return too many objects to offer any meaningful insight for customers. In this paper, we first propose a modified p-skyline (MPS) query based on a strong dominance operator to identify truly attractive results. Then we formulate a top k MPS (TkMPS) query on the basis of a new ranking criterion. We present effective approaches for processing the MPS query, and extend these approaches to process the TkMPS query. To improve the query performance, the reuse technique is adopted. Extensive experiments verify that the proposed algorithms for the MPS and TkMPS queries are efficient and effective, our MPS query can filter out 34.44% unattractive objects from the p-skyline query results at most, and although in some cases the results of the MPS and the p-skyline queries are just the same, our MPS query needs much less CPU, I/O, and memory costs.

[1]  Mikhail J. Atallah,et al.  Computing all skyline probabilities for uncertain data , 2009, PODS.

[2]  Ilaria Bartolini,et al.  The Skyline of a Probabilistic Relation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Yufei Tao,et al.  Efficient Computation of Range Aggregates against Uncertain Location-Based Queries , 2012, IEEE Trans. Knowl. Data Eng..

[4]  Xu Zhou,et al.  Adaptive Processing for Distributed Skyline Queries over Uncertain Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[5]  Hai Jin,et al.  Efficient and Progressive Algorithms for Distributed Skyline Queries over Uncertain Data , 2012, IEEE Trans. Knowl. Data Eng..

[6]  Ilaria Bartolini,et al.  Domination in the Probabilistic World , 2014, ACM Trans. Database Syst..

[7]  Yufei Tao,et al.  Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  Jeffrey Xu Yu,et al.  Probabilistic skyline operator over sliding windows , 2013, Inf. Syst..

[9]  Xu Zhou,et al.  Top k Favorite Probabilistic Products Queries , 2016, IEEE Trans. Knowl. Data Eng..

[10]  Christian Buchta,et al.  On the Average Number of Maxima in a Set of Vectors , 1989, Inf. Process. Lett..

[11]  Xiang Lian,et al.  Efficient processing of probabilistic group subspace skyline queries in uncertain databases , 2013, Inf. Syst..

[12]  Xiang Lian,et al.  Reverse skyline search in uncertain databases , 2008, TODS.

[13]  Xiaoling Li,et al.  A survey of queries over uncertain data , 2013, Knowledge and Information Systems.

[14]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[15]  Mikhail J. Atallah,et al.  Asymptotically efficient algorithms for skyline probabilities of uncertain data , 2011, TODS.

[16]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[17]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[18]  Bin Zhang,et al.  Incremental evaluation of top-k combinatorial metric skyline query , 2015, Knowledge-Based Systems.

[19]  Yifei Zhang,et al.  Probabilistic threshold query optimization based on threshold classification using ELM for uncertain data , 2016, Neurocomputing.

[20]  Kyuseok Shim,et al.  Processing of Probabilistic Skyline Queries Using MapReduce , 2015, Proc. VLDB Endow..

[21]  Fuad E. Alsaadi,et al.  Deep Belief Networks for Quantitative Analysis of a Gold Immunochromatographic Strip , 2016, Cognitive Computation.

[22]  Bin Zhang,et al.  Monochromatic and bichromatic mutual skyline queries , 2014, Expert Syst. Appl..

[23]  Lu Chen,et al.  Probabilistic skyline queries on uncertain time series , 2016, Neurocomputing.

[24]  Mao Ye,et al.  U-Skyline: A New Skyline Query for Uncertain Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[25]  Philip S. Yu,et al.  Mining Frequent Itemsets over Uncertain Databases , 2012, Proc. VLDB Endow..

[26]  Gang Chen,et al.  Efficient algorithms for finding the most desirable skyline objects , 2015, Knowl. Based Syst..

[27]  Hong Zhang,et al.  Facial expression recognition via learning deep sparse autoencoders , 2018, Neurocomputing.

[28]  Xu Zhou,et al.  Efficient monochromatic and bichromatic probabilistic reverse top-k query processing for uncertain big data , 2017, J. Comput. Syst. Sci..

[29]  Xiang Lian,et al.  Top-k dominating queries in uncertain databases , 2009, EDBT '09.

[30]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[31]  Muhammad Aamir Cheema,et al.  Stochastic skylines , 2012, TODS.

[32]  Seung-won Hwang,et al.  Skyline ranking for uncertain databases , 2014, Inf. Sci..

[33]  Gang Chen,et al.  On efficient reverse skyline query processing , 2014, Expert Syst. Appl..

[34]  Yufei Tao,et al.  Distributed Skyline Retrieval with Low Bandwidth Consumption , 2009, IEEE Transactions on Knowledge and Data Engineering.

[35]  Kenli Li,et al.  Reporting l most influential objects in uncertain databases based on probabilistic reverse top-k queries , 2017, Inf. Sci..