Skyline query processing for uncertain data

Recently, several research efforts have addressed answering skyline queries efficiently over large datasets. However, this research lacks methods to compute these queries over uncertain data, where uncertain values are represented as a range. In this paper, we define skyline queries over continuous uncertain data, and propose a novel, efficient framework to answer these queries. Query answers are probabilistic, where each object is associated with a probability value of being a query answer. Typically, users specify a probability threshold, that each returned object must exceed, and a tolerance value that defines the allowed error margin in probability calculation to reduce the computational overhead. Our framework employs an efficient two-phase query processing algorithm.

[1]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[2]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[3]  Georgia Koutrika,et al.  Personalization of queries in database systems , 2004, Proceedings. 20th International Conference on Data Engineering.

[4]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[5]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Dan Olteanu,et al.  Conditioning probabilistic databases , 2008, Proc. VLDB Endow..

[7]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Xiang Lian,et al.  Probabilistic ranked queries in uncertain databases , 2008, EDBT '08.

[9]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[11]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[12]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[13]  Ihab F. Ilyas,et al.  Ranking with Uncertain Scores , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[14]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[15]  Xiang Lian,et al.  Monochromatic and bichromatic reverse skyline search over uncertain databases , 2008, SIGMOD Conference.

[16]  Jian Pei,et al.  Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[17]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[18]  Reynold Cheng,et al.  Evaluating probability threshold k-nearest-neighbor queries over uncertain data , 2009, EDBT '09.

[19]  Werner Kießling,et al.  Foundations of Preferences in Database Systems , 2002, VLDB.