Summarizing level-two topological relations in large spatial datasets

Summarizing topological relations is fundamental to many spatial applications including spatial query optimization. In this article, we present several novel techniques to effectively construct cell density based spatial histograms for range (window) summarizations restricted to the four most important level-two topological relations: contains, contained, overlap, and disjoint. We first present a novel framework to construct a multiscale Euler histogram in 2D space with the guarantee of the exact summarization results for aligned windows in constant time. To minimize the storage space in such a multiscale Euler histogram, an approximate algorithm with the approximate ratio 19/12 is presented, while the problem is shown NP-hard generally. To conform to a limited storage space where a multiscale histogram may be allowed to have only k Euler histograms, an effective algorithm is presented to construct multiscale histograms to achieve high accuracy in approximately summarizing aligned windows. Then, we present a new approximate algorithm to query an Euler histogram that cannot guarantee the exact answers; it runs in constant time. We also investigate the problem of nonaligned windows and the problem of effectively partitioning the data space to support nonaligned window queries. Finally, we extend our techniques to 3D space. Our extensive experiments against both synthetic and real world datasets demonstrate that the approximate multiscale histogram techniques may improve the accuracy of the existing techniques by several orders of magnitude while retaining the cost efficiency, and the exact multiscale histogram technique requires only a storage space linearly proportional to the number of cells for many popular real datasets.

[1]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[2]  Divyakant Agrawal,et al.  Selectivity Estimation for Spatial Joins with Geometric Selections , 2002, EDBT.

[3]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[4]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[5]  Dimitris Papadias,et al.  Topological Inference , 1995, IJCAI.

[6]  Jeffrey F. Naughton,et al.  Accurate estimation of the cost of spatial selections , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[7]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[8]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[9]  Rudolf Fleischer,et al.  Online Maintenance of k-Medians and k-Covers on a Line , 2005, Algorithmica.

[10]  Silvio Micali,et al.  An O(v|v| c |E|) algoithm for finding maximum matching in general graphs , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[11]  Max J. Egenhofer,et al.  A Formal Definition of Binary Topological Relationships , 1989, FODO.

[12]  Ming-Ling Lo,et al.  Spatial hash-joins , 1996, SIGMOD '96.

[13]  Richard Beigel,et al.  The Geometry of Browsing , 1998, LATIN.

[14]  Jeffrey F. Naughton,et al.  Practical selectivity estimation through adaptive sampling , 1990, SIGMOD '90.

[15]  Frank Harary,et al.  Graph Theory , 2016 .

[16]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[17]  Refael Hassin,et al.  Improved complexity bounds for location problems on the real line , 1991, Oper. Res. Lett..

[18]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[19]  Qing Liu,et al.  Multiscale Histograms: Summarizing Topological Relations in Large Spatial Datasets , 2003, VLDB.

[20]  M. Egenhofer Categorizing Binary Topological Relations Between Regions, Lines, and Points in Geographic Databases , 1998 .

[21]  Jeffrey Considine,et al.  Spatio-temporal aggregation using sketches , 2004, Proceedings. 20th International Conference on Data Engineering.

[22]  Alexander S. Szalay,et al.  Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey , 2000, SIGMOD 2000.

[23]  Divyakant Agrawal,et al.  Exploring spatial datasets with histograms , 2006, Distributed and Parallel Databases.

[24]  George Kollios,et al.  Performance evaluation of spatio-temporal selectivity estimation techniques , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[25]  Viswanath Poosala Histogram-Based Estimation Techniques in Database Systems , 1997 .

[26]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[27]  Rong-chii Duh,et al.  Approximation of k-set cover by semi-local optimization , 1997, STOC '97.

[28]  Ben Shneiderman,et al.  The end of zero-hit queries: query previews for NASA’s Global Change Master Directory , 1999, International Journal on Digital Libraries.

[29]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[30]  Dimitris Papadias,et al.  Integration of spatial join algorithms for processing multiple inputs , 1999, SIGMOD '99.

[31]  S. Muthukrishnan,et al.  How to Summarize the Universe: Dynamic Maintenance of Quantiles , 2002, VLDB.

[32]  Anand Sivasubramaniam,et al.  Analyzing range queries on spatial data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[33]  Jimeng Sun,et al.  Selectivity estimation for predictive spatio-temporal queries , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[34]  Sridhar Ramaswamy,et al.  Selectivity estimation in spatial databases , 1999, SIGMOD '99.