论文信息 - Consensus answers for queries over probabilistic databases

Consensus answers for queries over probabilistic databases

We address the problem of finding a "best" deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (answers). This problem can be seen as a generalization of the well-studied inconsistent information aggregation problems (e.g. rank aggregation) to probabilistic databases. We consider this problem for various types of queries including SPJ queries, Top-k ranking queries, group-by aggregate queries, and clustering. For different distance metrics, we obtain polynomial time optimal or approximation algorithms for computing the consensus answers (or prove NP-hardness). Most of our results are for a general probabilistic database model, called and/xor tree model, which significantly generalizes previous probabilistic database models like x-tuples and block-independent disjoint models, and is of independent interest.

Jian Li | Amol Deshpande | A. Deshpande | J. Li

[1] Rahul Gupta,et al. Creating probabilistic databases from information extraction models , 2006, VLDB.

[2] Shirley Dex,et al. JR 旅客販売総合システム（マルス）における運用及び管理について , 1991 .

[3] Graham Cormode,et al. Approximation algorithms for clustering uncertain data , 2008, PODS.

[4] Ihab F. Ilyas,et al. Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[5] Sunil Prabhakar,et al. Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[6] Dan Suciu,et al. Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[7] Xi Zhang,et al. On the semantics and evaluation of top-k queries in probabilistic databases , 2008, ICDE Workshops.

[8] Xi Zhang,et al. Semantics and evaluation of top-k queries in probabilistic databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[9] Norbert Fuhr,et al. A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[10] Dan Suciu,et al. Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[11] Feifei Li,et al. Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[12] Moni Naor,et al. Rank aggregation methods for the Web , 2001, WWW '01.

[13] Andrew McGregor,et al. Estimating statistical aggregates on probabilistic data streams , 2008, TODS.

[14] Nicolas de Condorcet. Essai Sur L'Application de L'Analyse a la Probabilite Des Decisions Rendues a la Pluralite Des Voix , 2009 .

[15] Dan Olteanu,et al. From complete to incomplete information and back , 2007, SIGMOD '07.

[16] Mohamed A. Soliman,et al. Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17] Daisy Zhe Wang,et al. BayesStore: managing large, uncertain data repositories with probabilistic graphical models , 2008, Proc. VLDB Endow..

[18] S. Shapiro,et al. Mathematics without Numbers , 1993 .

[19] C. Dwork,et al. Rank Aggregation Revisited , 2002 .

[20] Yoshiko Wakabayashi. The Complexity of Computing Medians of Relations , 1998 .

[21] Graham Cormode,et al. Histograms and Wavelets on Probabilistic Data , 2010, IEEE Trans. Knowl. Data Eng..

[22] Laks V. S. Lakshmanan,et al. ProbView: a flexible probabilistic database system , 1997, TODS.

[23] Christopher Ré,et al. Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24] Hector Garcia-Molina,et al. The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[25] Jian Li,et al. A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[26] Jennifer Widom,et al. Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[27] Jian Pei,et al. Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[28] Feifei Li,et al. Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[29] P.-C.-F. Daunou,et al. Mémoire sur les élections au scrutin , 1803 .

[30] Tomasz Imielinski,et al. Incomplete Information in Relational Databases , 1984, JACM.

[31] Val Tannen,et al. Provenance semirings , 2007, PODS.

[32] Gösta Grahne. Horn tables-an efficient tool for handling incomplete information in databases , 1989, PODS '89.

[33] Val Tannen,et al. Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[34] Jian Pei,et al. Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[35] Silvio Micali,et al. An O(v|v| c |E|) algoithm for finding maximum matching in general graphs , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[36] Wei Hong,et al. Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[37] Ihab F. Ilyas,et al. Ranking with Uncertain Scores , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[38] Nir Ailon,et al. Aggregation of Partial Rankings, p-Ratings and Top-m Lists , 2007, SODA '07.

[39] Muhammad H. Alsuwaiyel,et al. Algorithms - Design Techniques and Analysis , 1999, Lecture Notes Series on Computing.

[40] R. Stephenson. A and V , 1962, The British journal of ophthalmology.

[41] Renée J. Miller,et al. Clean Answers over Dirty Databases: A Probabilistic Approach , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[42] Nir Ailon,et al. Aggregating inconsistent information: Ranking and clustering , 2008 .

[43] Prithviraj Sen,et al. Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[44] Christopher Ré,et al. Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization , 2007, VLDB.

[45] J. Hodge,et al. The Mathematics of Voting and Elections: A Hands-On Approach , 2005, Mathematical World.

[46] Ronald Fagin,et al. Comparing top k lists , 2003, SODA '03.

[47] Jennifer Widom,et al. Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[48] Lise Getoor,et al. Exploiting shared correlations in probabilistic databases , 2008, Proc. VLDB Endow..