Threshold-Based Direct Computation of Skyline Objects for Database with Uncertain Preferences

Skyline queries aim at finding a set of skyline objects from the given database. For categorical data, the notion of preferences is used to determine skyline objects. There are many real world applications where the preference can be uncertain. In such contexts, it is relevant to determine the probability that an object is a skyline object in a database with uncertain pairwise preferences. Skyline query is to determine a set of objects having skyline probability greater than a threshold. In this paper, we address this problem. To the best of our knowledge, there has not been any technique which handles this problem directly. There have been proposals to compute skyline probability of individual objects but applying these for skyline query is computationally expensive. In this paper, we propose a holistic algorithm that determines the set of skyline objects for a given threshold and a database of uncertain preferences. We establish the relationship between skyline probability and the probability of the union of events. We guide our search to prune objects which are unlikely to be skyline objects. We report extensive experimental analysis to justify the efficiency of our algorithm.

[1]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[2]  Arun K. Pujari,et al.  Efficient computation for probabilistic skyline over uncertain preferences , 2015, Inf. Sci..

[3]  Y. S. Sathe,et al.  Inequalities for the probability of the occurrence of at least m out of n events , 1980 .

[4]  Seymour M. Kwerel Bounds on the probability of the union and intersection of m events , 1975, Advances in Applied Probability.

[5]  Nikos Mamoulis,et al.  Scalable skyline computation using object-based space partitioning , 2009, SIGMOD Conference.

[6]  David Sankoff,et al.  AN INEQUALITY FOR PROBABILITIES , 1967 .

[7]  Jianzhong Li,et al.  Efficient Skyline Computation on Big Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[8]  Arun K. Pujari,et al.  Bi-directional Search for Skyline Probability , 2015, CALDAM.

[9]  Odysseas Papapetrou,et al.  Continuous fragmented skylines over distributed streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[10]  Seymour M. Kwerel Most Stringent Bounds on Aggregated Probabilities of Partially Specified Dependent Probability Systems , 1975 .

[11]  Endre Boros,et al.  Closed Form Two-Sided Bounds for Probabilities that At Least r and Exactly r Out of n Events Occur , 1989, Math. Oper. Res..

[12]  Jignesh M. Patel,et al.  Efficient Continuous Skyline Computation , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Xuemin Lin,et al.  Skyline probability over uncertain preferences , 2013, EDBT '13.