Domination in the Probabilistic World

In a probabilistic database, deciding if a tuple u is better than another tuple v has not a univocal solution, rather it depends on the specific Probabilistic Ranking Semantics (PRS) one wants to adopt so as to combine together tuples' scores and probabilities. In deterministic databases it is known that skyline queries are a remarkable alternative to (top-k) ranking queries, because they remove from the user the burden of specifying a scoring function that combines values of different attributes into a single score. The skyline of a deterministic relation R is the set of undominated tuples in R -- tuple u dominates tuple v iff on all the attributes of interest u is better than or equal to v and strictly better on at least one attribute. Domination is equivalent to having s(u) ≥ s(v) for all monotone scoring functions s(). The skyline of a probabilistic relation Rp can be similarly defined as the set of P-undominated tuples in Rp, where now u P-dominates v iff, whatever monotone scoring function one would use to combine the skyline attributes, u is reputed better than v by the PRS at hand. This definition, which is applicable to arbitrary ranking semantics and probabilistic correlation models, is parametric in the adopted PRS, thus it ensures that ranking and skyline queries will always return consistent results. In this article we provide an overall view of the problem of computing the skyline of a probabilistic relation. We show how, under mild conditions that indeed hold for all known PRSs, checking P-domination can be cast into an optimization problem, whose complexity we characterize for a variety of combinations of ranking semantics and correlation models. For each analyzed case we also provide specific P-domination rules, which are exploited by the algorithm we detail for the case where the probabilistic model is known to the query processor. We also consider the case in which the probability of tuple events can only be obtained through an oracle, and describe another skyline algorithm for this loosely integrated scenario. Our experimental evaluation of P-domination rules and skyline algorithms confirms the theoretical analysis.

[1]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[2]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[3]  Yufei Tao,et al.  Maintaining sliding window skylines on data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[5]  Nikos Mamoulis,et al.  Scalable skyline computation using object-based space partitioning , 2009, SIGMOD Conference.

[6]  G. Cox,et al.  ~ " " " ' l I ~ " " -" . : -· " J , 2006 .

[7]  Xi Zhang,et al.  Semantics and evaluation of top-k queries in probabilistic databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[8]  Wilfred Ng,et al.  Robust Ranking of Uncertain Data , 2011, DASFAA.

[9]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[10]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[11]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[12]  Jan Chomicki,et al.  Skyline queries, front and back , 2013, SGMD.

[13]  Dan Suciu,et al.  Parallel Skyline Queries , 2012, Theory of Computing Systems.

[14]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[15]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Alexander S. Szalay,et al.  Data Management in the Worldwide Sensor Web , 2007, IEEE Pervasive Computing.

[17]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[18]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[19]  Muhammad Aamir Cheema,et al.  Stochastic skyline operator , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[20]  Ilaria Bartolini,et al.  The Skyline of a Probabilistic Relation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[21]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Christopher Ré,et al.  Queries and materialized views on probabilistic databases , 2011, J. Comput. Syst. Sci..

[23]  Ilaria Bartolini,et al.  Efficient sort-based skyline evaluation , 2008, TODS.

[24]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[25]  Hans-Peter Kriegel,et al.  Querying Uncertain Spatio-Temporal Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[26]  Kevin Chen-Chuan Chang,et al.  Probabilistic top-k and ranking-aggregate queries , 2008, TODS.

[27]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[28]  Jennifer Widom,et al.  Databases with uncertainty and lineage , 2008, The VLDB Journal.

[29]  Klaudia Frankfurter Computers And Intractability A Guide To The Theory Of Np Completeness , 2016 .

[30]  Éva Tardos,et al.  Algorithm design , 2005 .

[31]  Mikhail J. Atallah,et al.  Asymptotically efficient algorithms for skyline probabilities of uncertain data , 2011, TODS.

[32]  Jignesh M. Patel,et al.  Efficient Skyline Computation over Low-Cardinality Domains , 2007, VLDB.

[33]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[34]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[35]  Yin Yang,et al.  Skyline Processing on Distributed Vertical Decompositions , 2013, IEEE Transactions on Knowledge and Data Engineering.

[36]  Alon Y. Halevy,et al.  Data integration with uncertainty , 2007, The VLDB Journal.

[37]  Charu C. Aggarwal Querying Uncertain Spatiotemporal Data , 2009 .

[38]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[39]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[40]  Chee-Yee Chong,et al.  Sensor networks: evolution, opportunities, and challenges , 2003, Proc. IEEE.

[41]  Muhammad Aamir Cheema,et al.  Stochastic skylines , 2012, TODS.

[42]  Judea Pearl,et al.  Chapter 2 – BAYESIAN INFERENCE , 1988 .

[43]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[44]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).