An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS

Geographic information systems (GIS) must support large georeferenced data sets. Due to the size of these data sets finding exact answers to spatial queries can be very time consuming. We present an incremental refining spatial join algorithm that can be used to report query result estimates while simultaneously provide incrementally refined confidence intervals for these estimates. Our approach allows for more interactive data exploration. While similar work has been done in relational databases, to the best of our knowledge this is the first work using this approach in GIS. We investigate different sampling methodologies and evaluate them through extensive experimental performance comparisons. Experiments on real and synthetic data show an order of magnitude response time improvement relative to the exact answer obtained when using the R-tree join.

[1]  Peter J. Haas,et al.  Large-sample and deterministic confidence intervals for online aggregation , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[2]  E. K. Bowen,et al.  Basic Statistics for Business and Economics , 1982 .

[3]  Tapabrata Maiti,et al.  Elementary Survey Sampling (6th ed.) , 2006 .

[4]  Ray R. Larson,et al.  Geographic information retrieval and spatial browsing , 1996 .

[5]  Anand Sivasubramaniam,et al.  Selectivity estimation for spatial joins , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Frank Olken,et al.  Random Sampling from Databases , 1993 .

[7]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[8]  S. Seshadri Probabilistic methods in query processing , 1992 .

[9]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[10]  Joseph M. Hellerstein,et al.  Informix under CONTROL: Online Query Processing , 2000, Data Mining and Knowledge Discovery.

[11]  Claudia Bauzer Medeiros,et al.  Databases for GIS , 1994, SGMD.

[12]  Dimitris Papadias,et al.  Processing and optimization of multiway spatial joins using R-trees , 1999, PODS '99.

[13]  Richard L. Scheaffer,et al.  Elementary Survey Sampling , 1971 .

[14]  Hilda M. Davies,et al.  Basic Statistics for Business and Economics. , 1964 .

[15]  J. Hammersley SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[16]  W. G. Marchal,et al.  Basic Statistics for Business and Economics , 1994 .

[17]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[18]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.