Uncertain Spatial Data Management: An Overview

Both the current trends in technology such as smartphones, general mobile devices, stationary sensors, and satellites as well as a new user mentality of using this technology to voluntarily share enriched location information produces a flood of geo-spatial and geo-spatiotemporal data. This data flood provides tremendous potential for discovering new and useful knowledge. But in addition to the fact that measurements are imprecise, spatial data is often interpolated between discrete observations. To reduce communication and bandwidth utilization, data is often subjected to a reduction, thereby eliminating some of the known/recorded values. These issues introduce the notion of uncertainty in spatial data management, an aspect raising the imminent need for scalable and flexible solutions. The main scope of this chapter is to survey existing techniques for managing, querying, and mining uncertain spatial data. First, this chapter surveys common data representations for uncertain data, explains the commonly used possible worlds semantics to interpret an uncertain database, and surveys existing system to process uncertain data. Then, this chapter defines the notion of probabilistic result semantics to distinguish the task of computing individual object probabilities versus computing entire result probabilities. This is important, as, for many queries, the problem of computing object-level probabilities can be solved efficiently, whereas result-level probabilities are hard to compute. Finally, this chapter introduces a novel paradigm to efficiently answer any kind of query on uncertain data: the Paradigm of Equivalent Worlds, which groups the exponential set of possible database worlds into a polynomial number of sets of equivalent worlds that can be processed efficiently. Examples and use-cases of querying uncertain spatial data are provided using the example of uncertain range queries.

[1]  Joseph Y. Halpern,et al.  From Statistical Knowledge Bases to Degrees of Belief , 1996, Artif. Intell..

[2]  Xiang Lian,et al.  Probabilistic Maximum Range-Sum Queries on Spatial Database , 2019, SIGSPATIAL/GIS.

[3]  Xiaoling Li,et al.  A survey of queries over uncertain data , 2013, Knowledge and Information Systems.

[4]  Reynold Cheng,et al.  Voronoi-based nearest neighbor search for multi-dimensional uncertain databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[5]  Sunil Prabhakar,et al.  Indexing Uncertain Data , 2018, Encyclopedia of Database Systems.

[6]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.

[7]  Hans-Peter Kriegel,et al.  Scalable Probabilistic Similarity Ranking in Uncertain Databases , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[9]  Reynold Cheng,et al.  Evaluating probability threshold k-nearest-neighbor queries over uncertain data , 2009, EDBT '09.

[10]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[11]  Xiang Lian,et al.  Monochromatic and bichromatic reverse skyline search over uncertain databases , 2008, SIGMOD Conference.

[12]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[13]  Timos K. Sellis,et al.  Probabilistic Range Monitoring of Streaming Uncertain Positions in GeoSocial Networks , 2012, SSDBM.

[14]  Xiang Lian,et al.  Probabilistic Inverse Ranking Queries over Uncertain Data , 2009, DASFAA.

[15]  Huan Liu,et al.  Twitter Data Analytics , 2013, SpringerBriefs in Computer Science.

[16]  Jian Pei,et al.  Probabilistic Reverse Nearest Neighbor Queries on Uncertain Data , 2010, IEEE Transactions on Knowledge and Data Engineering.

[17]  Daniel Z. Sui,et al.  True lies in geospatial big data: detecting location spoofing in social media , 2016, Ann. GIS.

[18]  Ambuj K. Singh,et al.  APLA: Indexing Arbitrary Probability Distributions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Peter R. Nelson,et al.  Multiple Comparisons: Theory and Methods , 1997 .

[20]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[22]  Hans-Peter Kriegel,et al.  Similarity search and mining in uncertain databases , 2010, Proc. VLDB Endow..

[23]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[24]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Esteban Zimányi,et al.  Query Evaluation in Probabilistic Relational Databases , 1997, Theor. Comput. Sci..

[26]  Xiang Lian,et al.  Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data , 2009, The VLDB Journal.

[27]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[28]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[29]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[30]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[31]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[32]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[33]  Hans-Peter Kriegel,et al.  Similarity Search on Uncertain Spatio-temporal Data , 2013, SISAP.

[34]  Hans-Peter Kriegel,et al.  A novel probabilistic pruning approach to speed up similarity queries in uncertain databases , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[35]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[36]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[37]  Beng Chin Ooi,et al.  Effectively Indexing Uncertain Moving Objects for Predictive Queries , 2009, Proc. VLDB Endow..

[38]  Jian Li,et al.  Consensus answers for queries over probabilistic databases , 2008, PODS.

[39]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[40]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[41]  Xuan Song,et al.  Accelerating Spatial Data Processing with MapReduce , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.

[42]  Helen Couclelis,et al.  The Certainty of Uncertainty: GIS and the Limits of Geographic Knowledge , 2003, Trans. GIS.

[43]  Hans-Peter Kriegel,et al.  Indexing uncertain spatio-temporal data , 2012, CIKM.

[44]  Yufei Tao,et al.  Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[45]  Christopher Ré,et al.  Probabilistic databases: diamonds in the dirt , 2009, CACM.

[46]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[47]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[48]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[49]  Jian Pei,et al.  Query answering techniques on uncertain and probabilistic data: tutorial summary , 2008, SIGMOD Conference.

[50]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[51]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[52]  Anna Liu,et al.  PODS: a new model and processing algorithms for uncertain data streams , 2010, SIGMOD Conference.

[53]  David L. Tulloch Crowdsourcing geographic knowledge: volunteered geographic information (VGI) in theory and practice , 2014, Int. J. Geogr. Inf. Sci..

[54]  Hans-Peter Kriegel,et al.  Probabilistic ranking in fuzzy object databases , 2012, CIKM '12.

[55]  Da Yan,et al.  Fraction-Score: A New Support Measure for Co-location Pattern Mining , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[56]  Jianzhong Li,et al.  A survey of uncertain data management , 2018, Frontiers of Computer Science.

[57]  Hans-Peter Kriegel,et al.  Model-based probabilistic frequent itemset mining , 2013, Knowledge and Information Systems.

[58]  Rong Zheng,et al.  Efficient algorithms for spatial skyline query with uncertainty , 2013, SIGSPATIAL/GIS.

[59]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[60]  Arthur Zimek,et al.  A Framework for Clustering Uncertain Data , 2015, Proc. VLDB Endow..

[61]  Andreas Züfle,et al.  Representative Query Answers on Uncertain Data , 2019, SSTD.

[62]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[63]  Beng Chin Ooi,et al.  Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..

[64]  Hans-Peter Kriegel,et al.  Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data , 2011, Proc. VLDB Endow..

[65]  Hai Jin,et al.  Probabilistic Skyline Queries over Uncertain Moving Objects , 2013, Comput. Informatics.

[66]  Lizhen Wang,et al.  Finding Probabilistic Prevalent Colocations in Spatially Uncertain Data Sets , 2013, IEEE Transactions on Knowledge and Data Engineering.

[67]  Hans-Peter Kriegel,et al.  Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories , 2013, Proc. VLDB Endow..

[68]  Daisy Zhe Wang,et al.  BayesStore: managing large, uncertain data repositories with probabilistic graphical models , 2008, Proc. VLDB Endow..

[69]  W. Beyer CRC Standard Probability And Statistics Tables and Formulae , 1990 .

[70]  Hans-Peter Kriegel,et al.  Managing uncertainty in spatial and spatio-temporal data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[71]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[72]  Ihab F. Ilyas,et al.  Ranking with Uncertain Scores , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[73]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[74]  Arthur Zimek,et al.  Representative clustering of uncertain data , 2014, KDD.

[75]  Andreas Züfle,et al.  Similarity search and mining in uncertain spatial and spatio-temporal databases , 2013 .

[76]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[77]  Kai Zheng,et al.  Probabilistic range queries for uncertain trajectories on road networks , 2011, EDBT/ICDT '11.

[78]  Xu Zhou,et al.  Top k probabilistic skyline queries on uncertain data , 2018, Neurocomputing.

[79]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[80]  Robin G. Fegeas,et al.  An Overview of FIPS 173, The Spatial Data Transfer Standard , 1992 .

[81]  Jian Li,et al.  Ranking continuous probabilistic datasets , 2010, Proc. VLDB Endow..

[82]  Takahiro Hara,et al.  Probabilistic MaxRS Queries on Uncertain Data , 2017, DEXA.

[83]  Dieter Pfoser,et al.  Handling Uncertainty in Geo-Spatial Data , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[84]  J. Darroch On the Distribution of the Number of Successes in Independent Trials , 1964 .

[85]  Reynold Cheng,et al.  Efficient Mining of Frequent Item Sets on Large Uncertain Databases , 2012, IEEE Transactions on Knowledge and Data Engineering.

[86]  Hans-Peter Kriegel,et al.  Querying Uncertain Spatio-Temporal Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[87]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[88]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[89]  Yufei Tao,et al.  Probabilistic Spatial Queries on Existentially Uncertain Data , 2005, SSTD.

[90]  Cyrus Shahabi,et al.  Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases , 2004, VLDB.

[91]  Dieter Pfoser,et al.  Managing Uncertainty in Evolving Geo-Spatial Data , 2020, 2020 21st IEEE International Conference on Mobile Data Management (MDM).

[92]  Christopher Ré,et al.  Query Evaluation on Probabilistic Databases , 2006, IEEE Data Eng. Bull..

[93]  Xiang Lian,et al.  Probabilistic ranked queries in uncertain databases , 2008, EDBT '08.

[94]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[95]  Yoshiharu Ishikawa,et al.  Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[96]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[97]  Hans-Peter Kriegel,et al.  Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories , 2014, DASFAA.

[98]  Subhash Suri,et al.  Range-Max Queries on Uncertain Data , 2016, PODS.

[99]  Hans-Peter Kriegel,et al.  Probabilistic Nearest-Neighbor Query on Uncertain Objects , 2007, DASFAA.

[100]  Reynold Cheng,et al.  Uncertain Voronoi cell computation based on space decomposition , 2017, GeoInformatica.

[101]  Yvan Bédard,et al.  SPATIAL DATA UNCERTAINTY IN THE VGI WORLD: GOING FROM CONSUMER TO PRODUCER , 2019 .

[102]  Hans-Peter Kriegel,et al.  ProUD: Probabilistic Ranking in Uncertain Databases , 2008, SSDBM.

[103]  Gang Chen,et al.  Indexing metric uncertain data for range queries and range joins , 2017, The VLDB Journal.

[104]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[105]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[106]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[107]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[108]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[109]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[110]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[111]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.