Global Top-k Aggregate Queries Based on X-tuple in Uncertain Database

A Top-k aggregate query, which is a powerful technique when dealing with large quantity of data, ranks groups of tuples by their aggregate values and returns k groups with the highest aggregate values. However, compared to Top-k in traditional databases, queries over uncertain database are more complicated because of the existence of exponential possible worlds. As a powerful semantic of Top-k in uncertain database, Global Top-k return k highest-ranked tuples according to their probabilities of being in the Top-k answers in possible worlds. We propose a x-tuple based method to process Global Top-k aggregate queries in uncertain database. Our method has two levels, group state generation and G-x-Top-k query processing. In the former level, group states, which satisfy the properties of x-tuple, are generated one after the other according to their aggregate values, while in the latter level, dynamic programming based Global x-tuple Top-k query processing are employed to return the answers. Comprehensive experiments on different data sets demonstrate the effectiveness of the proposed solutions.

[1]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[2]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[3]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[4]  Jian Pei,et al.  Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[6]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[7]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, Proc. VLDB Endow..

[8]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[9]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[10]  Xi Zhang,et al.  On the semantics and evaluation of top-k queries in probabilistic databases , 2008, ICDE Workshops.

[11]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[12]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[13]  Xiang Lian,et al.  Probabilistic ranked queries in uncertain databases , 2008, EDBT '08.

[14]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[15]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[17]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  V. Conclusion , .

[19]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[20]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[21]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[22]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[23]  Susanne E. Hambrusch,et al.  Indexing Uncertain Categorical Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[25]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[26]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27]  Ling Liu,et al.  From Data Privacy to Location Privacy: Models and Algorithms , 2007, VLDB.

[28]  Rahul Gupta,et al.  Creating probabilistic databases from information extraction models , 2006, VLDB.

[29]  Kevin Chen-Chuan Chang,et al.  Probabilistic top-k and ranking-aggregate queries , 2008, TODS.