SUM Query Processing over Probabilistic Data

SUM queries are crucial for many applications that need to deal with probabilistic data. In this report, we are interested in the queries, called ALL_SUM, that return all possible sum values and their probabilities. In general, there is no efficient solution for the problem of evaluating ALL_SUM queries. But, for many practical applications, where aggregate values are small integers or real numbers with small precision, it is possible to develop efficient solutions. In this report, based on a recursive approach, we propose a complete solution for this problem. We implemented our solution and conducted an extensive experimental evaluation over synthetic and real-world data sets; the results show its effectiveness.

[1]  Rahul Gupta,et al.  Creating probabilistic databases from information extraction models , 2006, VLDB.

[2]  Daniel Deutch,et al.  TOP-K projection queries for probabilistic business processes , 2009, ICDT '09.

[3]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[4]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, Proc. VLDB Endow..

[5]  Serge Abiteboul,et al.  On the expressiveness of probabilistic XML models , 2009, The VLDB Journal.

[6]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[7]  Robert B. Ross,et al.  Aggregate operators in probabilistic databases , 2005, JACM.

[8]  Anastasia Ailamaki,et al.  Challenges inbuilding a DBMS Resource Advisor , 2006, IEEE Data Eng. Bull..

[9]  Andrew McGregor,et al.  Conditioning and aggregating uncertain data streams , 2010, Proc. VLDB Endow..

[10]  Yufei Tao,et al.  Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[12]  V. S. Subrahmanian,et al.  Aggregate Query Answering under Uncertain Schema Mappings , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[13]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Stanley B. Zdonik,et al.  Top-k queries on uncertain data: on score distribution and typical answers , 2009, SIGMOD Conference.

[15]  Hua Lu,et al.  Probabilistic threshold k nearest neighbor queries over moving objects in symbolic indoor space , 2010, EDBT '10.

[16]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[17]  Sriram Raghavan,et al.  Avatar Information Extraction System , 2006, IEEE Data Eng. Bull..

[18]  Yehoshua Sagiv,et al.  Query evaluation over probabilistic XML , 2009, The VLDB Journal.

[19]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[20]  Amol Deshpande,et al.  Online Filtering, Smoothing and Probabilistic Modeling of Streaming data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[21]  Graham Cormode,et al.  Sketching probabilistic data streams , 2007, SIGMOD '07.

[22]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[24]  Grzegorz A. Rempala,et al.  ELECTRONICCOMMUNICATIONSinPROBABILITY ASYMPTOTICS FOR PRODUCTS OF SUMS AND U -STATISTICS , 2022 .

[25]  Roberto Tamassia,et al.  Continuous probabilistic nearest-neighbor queries for uncertain trajectories , 2009, EDBT '09.

[26]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[28]  Andrew McGregor,et al.  Estimating statistical aggregates on probabilistic data streams , 2008, TODS.

[29]  Amol Deshpande,et al.  Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[30]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[31]  Christopher Ré,et al.  The trichotomy of HAVING queries on a probabilistic database , 2009, The VLDB Journal.

[32]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[33]  Jian Pei,et al.  Superseding Nearest Neighbor Search on Uncertain Spatial Databases , 2010, IEEE Transactions on Knowledge and Data Engineering.

[34]  Mikhail J. Atallah,et al.  Computing all skyline probabilities for uncertain data , 2009, PODS.