Exploring spatial datasets with histograms

As online spatial datasets grow both in number and sophistication, it becomes increasingly difficult for users to decide whether a dataset is suitable for their tasks, especially when they do not have prior knowledge of the dataset. In this paper, we propose browsing as an effective and efficient way to explore the content of a spatial dataset. Browsing allows users to view the size of a result set before evaluating the query at the database, thereby avoiding zero-hit/mega-hit queries and saving time and resources. Although the underlying technique supporting browsing is similar to range query aggregation and selectivity estimation, spatial dataset browsing poses some unique challenges. In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the contains, contained and the overlap relations. We prove a lower bound on the storage required to answer queries about the contains relation accurately at a given resolution. We then present three storage-efficient approximation algorithms which we believe to be the first to estimate query results about these spatial relations. We evaluate these algorithms with both synthetic and real world datasets and show that they provide highly accurate estimates for datasets with various characteristics.

[1]  Frank Harary,et al.  Graph Theory , 2016 .

[2]  William S. Massey,et al.  Algebraic Topology: An Introduction , 1977 .

[3]  Annette Herskovits,et al.  Language and spatial cognition , 1986 .

[4]  Erland Jungert,et al.  Symbolic and Geometric Connectivity Graph Methods for Route Planning in Digitized Maps , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[6]  Suh-Yin Lee,et al.  Signature file as a spatial filter for iconic image database , 1992, J. Vis. Lang. Comput..

[7]  Dimitris Papadias,et al.  Topological Inference , 1995, IJCAI.

[8]  Andrew U. Frank,et al.  Qualitative Spatial Reasoning: Cardinal Directions as an Example , 1996, Int. J. Geogr. Inf. Sci..

[9]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[10]  M. Egenhofer Categorizing Binary Topological Relations Between Regions, Lines, and Points in Geographic Databases , 1998 .

[11]  Richard Beigel,et al.  The Geometry of Browsing , 1998, LATIN.

[12]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[13]  Yannis E. Ioannidis,et al.  Hierarchical Prefix Cubes for Range-Sum Queries , 1999, VLDB.

[14]  Sridhar Ramaswamy,et al.  Selectivity estimation in spatial databases , 1999, SIGMOD '99.

[15]  Ben Shneiderman,et al.  The end of zero-hit queries: query previews for NASA’s Global Change Master Directory , 1999, International Journal on Digital Libraries.

[16]  Divyakant Agrawal,et al.  Data Cubes in Dynamic Environments , 1999, IEEE Data Eng. Bull..

[17]  Douglas M. Flewelling,et al.  Using digital spatial archives effectively , 1999, Int. J. Geogr. Inf. Sci..

[18]  Jeffrey F. Naughton,et al.  Accurate estimation of the cost of spatial selections , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Anand Sivasubramaniam,et al.  Analyzing range queries on spatial data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[20]  Divyakant Agrawal,et al.  pCube: Update-efficient online aggregation with progressive feedback and error bounds , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[21]  Terence R. Smith,et al.  The Alexandria Digital Library Project , 2003 .

[22]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[23]  Timos K. Sellis,et al.  Qualitative representation of spatial knowledge in two-dimensional space , 1994, The VLDB Journal.