Ranking with Uncertain Scores

Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes needs to handle new problems that are fundamentally different from conventional ranking. Specifically, uncertainty in records' scores induces a partial order over records, as opposed to the total order that is assumed in the conventional ranking settings. In this paper, we present a new probabilistic model, based on partial orders, to encapsulate the space of possible rankings originating from score uncertainty. Under this model, we formulate several ranking query types with different semantics. We describe and analyze a set of efficient query evaluation algorithms. We show that our techniques can be used to solve the problem of rank aggregation in partial orders. In addition, we design novelsampling techniques to compute approximate query answers. Our experimental evaluation uses both real and synthetic data. The experimental study demonstrates the efficiency and effectiveness of our techniques in different settings.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[3]  Seung-won Hwang,et al.  Probe Minimization by Schedule Optimization: Supporting Top-K Queries with Expensive Predicates , 2007, IEEE Transactions on Knowledge and Data Engineering.

[4]  Chris Jermaine,et al.  A Bayesian Method for Guessing the Extreme Values in a Data Set , 2007, VLDB.

[5]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[6]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[7]  Subbarao Kambhampati,et al.  QUIC: Handling query imprecision & data incompleteness in autonomous databases , 2007 .

[8]  Xintao Wu,et al.  Learning missing values from summary constraints , 2002, SKDD.

[9]  Xi Zhang,et al.  On the semantics and evaluation of top-k queries in probabilistic databases , 2008, ICDE Workshops.

[10]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[11]  Subbarao Kambhampati,et al.  Query processing over incomplete autonomous databases: query rewriting using learned data dependencies , 2009, The VLDB Journal.

[12]  Mark Jerrum,et al.  The Markov chain Monte Carlo method: an approach to approximate counting and integration , 1996 .

[13]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[14]  Subbarao Kambhampati,et al.  Query Processing over Incomplete Autonomous Databases , 2007, VLDB.

[15]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[16]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[19]  Peter Winkler,et al.  Counting linear extensions is #P-complete , 1991, STOC '91.

[20]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[21]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  B. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[23]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[24]  Chris Jermaine,et al.  Guessing the extreme values in a data set: a Bayesian method and its applications , 2009, The VLDB Journal.

[25]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[27]  Dianne P. O'Leary Multidimensional integration: partition and conquer , 2004 .

[28]  Wei Hong,et al.  Model-based approximate querying in sensor networks , 2005, The VLDB Journal.

[29]  Jan Chomicki,et al.  Preference formulas in relational queries , 2003, TODS.

[30]  Martin E. Dyer,et al.  Faster random generation of linear extensions , 1999, SODA '98.