Processing over Probabilistic Data

SUM queries are crucial for many applications that need to deal with probabilistic data. In this report, we are interested in the queries, called ALL_SUM, that return all possible sum values and their probabilities. In general, there is no efficient solution for the problem of evaluating ALL_SUM queries. But, for many practical applications, where aggregate values are small integers or real numbers with small precision, it is possible to develop efficient solutions. In this report, based on a recursive approach, we propose a complete solution for this problem. We implemented our solution and conducted an extensive experimental evaluation over synthetic and real-world data sets; the results show its effectiveness.

[1]  Andrew McGregor,et al.  Conditioning and aggregating uncertain data streams , 2010, Proc. VLDB Endow..

[2]  Jian Pei,et al.  Superseding Nearest Neighbor Search on Uncertain Spatial Databases , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Hua Lu,et al.  Probabilistic threshold k nearest neighbor queries over moving objects in symbolic indoor space , 2010, EDBT '10.

[4]  Christopher Ré,et al.  The trichotomy of HAVING queries on a probabilistic database , 2009, The VLDB Journal.

[5]  Serge Abiteboul,et al.  On the expressiveness of probabilistic XML models , 2009, The VLDB Journal.

[6]  Yehoshua Sagiv,et al.  Query evaluation over probabilistic XML , 2009, The VLDB Journal.

[7]  Stanley B. Zdonik,et al.  Top-k queries on uncertain data: on score distribution and typical answers , 2009, SIGMOD Conference.

[8]  Mikhail J. Atallah,et al.  Computing all skyline probabilities for uncertain data , 2009, PODS.

[9]  Roberto Tamassia,et al.  Continuous probabilistic nearest-neighbor queries for uncertain trajectories , 2009, EDBT '09.

[10]  Daniel Deutch,et al.  TOP-K projection queries for probabilistic business processes , 2009, ICDT '09.

[11]  Amol Deshpande,et al.  Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[12]  V. S. Subrahmanian,et al.  Aggregate Query Answering under Uncertain Schema Mappings , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[13]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[14]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[15]  Amol Deshpande,et al.  Online Filtering, Smoothing and Probabilistic Modeling of Streaming data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[17]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[18]  Graham Cormode,et al.  Sketching probabilistic data streams , 2007, SIGMOD '07.

[19]  Andrew McGregor,et al.  Estimating statistical aggregates on probabilistic data streams , 2007, PODS.

[20]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[22]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[23]  Rahul Gupta,et al.  Creating probabilistic databases from information extraction models , 2006, VLDB.

[24]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[26]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[27]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[28]  Graham Cormode,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[29]  Yufei Tao,et al.  Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[30]  Sriram Raghavan,et al.  Avatar Information Extraction System , 2006, IEEE Data Eng. Bull..

[31]  Robert B. Ross,et al.  Aggregate operators in probabilistic databases , 2005, JACM.

[32]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[33]  ELECTRONICCOMMUNICATIONSinPROBABILITY ASYMPTOTICS FOR PRODUCTS OF SUMS AND U -STATISTICS , 2022 .