Efficient techniques for range search queries on earth science data

We consider the problem of organizing large scale earth science raster data to efficiently handle queries for identifying regions whose parameters fall within certain range values specified by the queries. This problem seems to be critical to enabling basic data mining tasks such as determining associations between physical phenomena and spatial factors, detecting changes and trends, and content based retrieval. We assume that the input is too large to fit in internal memory and hence focus on data structures and algorithms that minimize the I/O bounds. A new data structure, called a tree-of-regions (ToR), is introduced and involves a combination of an R-tree and efficient representation of regions. It is shown that such a data structure enables the handling of range queries in an optimal I/O time, under certain reasonable assumptions. We also show that updates to the ToR can be handled efficiently. Experimental results for a variety of multi-valued earth science data illustrate the fast execution times of a wide range of queries, as predicted by our theoretical analysis.

[1]  Sridhar Ramaswamy,et al.  The P-range tree: a new data structure for range searching in secondary memory , 1995, SODA '95.

[2]  Lars Arge,et al.  The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract) , 1995, WADS.

[3]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[4]  Sridhar Ramaswamy,et al.  Indexing for Data Models with Constraints and Classes , 1996, J. Comput. Syst. Sci..

[5]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[6]  Raj Jain,et al.  Algorithms and strategies for similarity retrieval , 1996 .

[7]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[8]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[9]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[10]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[11]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[12]  Jeffrey Scott Vitter,et al.  On two-dimensional indexability and optimal range search indexing , 1999, PODS '99.

[13]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[14]  Jeffrey Scott Vitter,et al.  External memory algorithms , 1998, ESA.

[15]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[18]  K. Hinrichs,et al.  E cient Bulk Operations on Dynamic R-trees , 1999 .

[19]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[20]  Bernhard Seeger,et al.  A Generic Approach to Bulk Loading Multidimensional Index Structures , 1997, VLDB.

[21]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[22]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[23]  Xiaoyang Sean Wang,et al.  Value Range Queries on Earth Science Data via Histogram Clustering , 2000, TSDM.