The Skyline of a Probabilistic Relation

In a deterministic relation R, tuple u dominates tuple v if u is no worse than v on all the attributes of interest, and better than v on at least one attribute. This concept is at the heart of skyline queries, that return the set of undominated tuples in R. In this paper, we extend the notion of skyline to probabilistic relations by generalizing to this context the definition of tuple domination. Our approach is parametric in the semantics for linearly ranking probabilistic tuples and, being it based on order-theoretic principles, preserves the three fundamental properties the skyline has in the deterministic case: 1) It equals the union of all top-1 results of monotone scoring functions; 2) it requires no additional parameter; and 3) it is insensitive to actual attribute scales. We then show how domination among probabilistic tuples (or P-domination for short) can be efficiently checked by means of a set of rules. We detail such rules for the cases in which tuples are ranked using either the “expected rank” or the “expected score” semantics, and explain how the approach can be applied to other semantics as well. Since computing the skyline of a probabilistic relation is a time-consuming task, we introduce a family of algorithms for checking P-domination rules in an optimized way. Experiments show that these algorithms can significantly reduce the actual execution times with respect to a naive evaluation.

[1]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[2]  Alon Y. Halevy,et al.  Data integration with uncertainty , 2007, The VLDB Journal.

[3]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[4]  Ilaria Bartolini,et al.  Efficient sort-based skyline evaluation , 2008, TODS.

[5]  Mikhail J. Atallah,et al.  Computing all skyline probabilities for uncertain data , 2009, PODS.

[6]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Yufei Tao,et al.  On finding skylines in external memory , 2011, PODS.

[8]  Oren Etzioni,et al.  Structured Querying of Web Text Data: A Technical Challenge , 2007, CIDR.

[9]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[10]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[12]  Muhammad Aamir Cheema,et al.  Stochastic skyline operator , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[13]  Dimitris Papadias,et al.  Collaborative Filtering with Personalized Skylines , 2011, IEEE Transactions on Knowledge and Data Engineering.

[14]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[15]  Panos Kalnis,et al.  Efficient OLAP Operations in Spatial Data Warehouses , 2001, SSTD.

[16]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[17]  Jeffrey Xu Yu,et al.  Probabilistic Skyline Operator over Sliding Windows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[18]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[19]  E. Szpilrajn Sur l'extension de l'ordre partiel , 1930 .

[20]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[21]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Shuigeng Zhou,et al.  Towards Energy-Efficient Skyline Monitoring in Wireless Sensor Networks , 2007, EWSN.

[23]  Xi Zhang,et al.  Semantics and evaluation of top-k queries in probabilistic databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[24]  Wilfred Ng,et al.  Robust Ranking of Uncertain Data , 2011, DASFAA.

[25]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[26]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[27]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[28]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.