Probabilistic Convex Hull Queries over Uncertain Data

The convex hull of a set of two-dimensional points, P, is the minimal convex polygon that contains all the points in P. Convex hull is important in many applications such as GIS, statistical analysis and data mining. Due to the ubiquity of data uncertainty such as location uncertainty in real-world applications, we study the concept of convex hull over uncertain data in 2D space. We propose the Probabilistic Convex Hull(PCH) query and demonstrate its applications, such as Flickr landscape photo extraction and activity region visualization, where location uncertainty is incurred by GPS devices or sensors. To tackle the problem of possible world explosion, we develop an O(N3) algorithm based on geometric properties, where N is the data size. We further improve this algorithm with spatial indices and effective pruning techniques, which prune the majority of data instances. To achieve better time complexity, we propose another O(N2 log N) algorithm, by maintaining a probability oracle in the form of a circular array with nice properties. Finally, to support applications that require fast response, we develop a Gibbs-sampling-based approximation algorithm which efficiently finds the PCH with high accuracy. Extensive experiments are conducted to verify the efficiency of our algorithms for answering PCH queries.

[1]  Xiang Lian,et al.  Probabilistic Group Nearest Neighbor Queries in Uncertain Databases , 2008, IEEE Transactions on Knowledge and Data Engineering.

[2]  Xiang Lian,et al.  Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data , 2009, The VLDB Journal.

[3]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[4]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[5]  Sharad Mehrotra,et al.  Progressive approximate aggregate queries with a multi-resolution tree structure , 2001, SIGMOD '01.

[6]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[7]  Monte Carlo Integration Markov Chain Monte Carlo and Gibbs Sampling , 2002 .

[8]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[9]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[10]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[11]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[12]  Selim G. Akl,et al.  EFFICIENT CONVEX HULL ALGORITHMS FOR PATTERN RECOGNITION APPLICATIONS. , 1979 .

[13]  Wilfred Ng,et al.  Robust Ranking of Uncertain Data , 2011, DASFAA.

[14]  Wilfred Ng,et al.  Leveraging read rates of passive RFID tags for real-time indoor location tracking , 2012, CIKM '12.

[15]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[16]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[17]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[18]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.

[19]  Johannes Gehrke,et al.  iReduct: differential privacy with reduced relative errors , 2011, SIGMOD '11.

[20]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[21]  Wilfred Ng,et al.  A probabilistic convex hull query tool , 2012, EDBT '12.

[22]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[23]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[24]  Subhash Suri,et al.  On the Most Likely Convex Hull of Uncertain Points , 2013, ESA.

[25]  Mikhail J. Atallah,et al.  Computing all skyline probabilities for uncertain data , 2009, PODS.

[26]  Feifei Li,et al.  Reverse Furthest Neighbors in Spatial Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[28]  Xike Xie,et al.  UV-diagram: A Voronoi diagram for uncertain data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[29]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[30]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[31]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.