Value Range Queries on Earth Science Data via Histogram Clustering

Remote sensing data as well as ground-based and model output data about the Earth system can be very large in volume. On the other hand, in order to use the data efficiently, scientists need to search for data based on not only metadata but also actual data values. To answer value range queries by scanning very large volumes of data is obviously unrealistic. This article studies a clustering technique on histograms of data values on predefined cells to index the cells. Through this index system, the so-called statistical range queries can be answered quickly and approximately together with an accuracy assessment. Examples of using this technique for Earth science data sets are given in this article.

[1]  Theodore Johnson,et al.  Range selectivity estimation for continuous attributes , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[2]  Viswanath Poosala,et al.  Fast approximate answers to aggregate queries on a data cube , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[3]  Xiaoyang Sean Wang,et al.  The Virtual Domain Application Data Center: serving interdisciplinary Earth scientists , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  William N. Venables,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[6]  Anthony R. Olsen,et al.  Simplifying Visual Appearance by Sorting: An Example using 159 AVHRR Classes , 1999 .

[7]  Emanuele Trucco,et al.  Robust motion and correspondence of noisy 3-D point sets with missing data , 1999, Pattern Recognit. Lett..

[8]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[9]  Peter J. Haas Techniques for online exploration of large object-relational datasets , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[10]  A. Cracknell The advanced very high resolution radiometer , 1997 .

[11]  Brian Everitt,et al.  Cluster analysis , 1974 .

[12]  Xiaoyang Sean Wang,et al.  A pyramid data model for supporting content-based browsing and knowledge discovery , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[13]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[14]  Xiaoyang Sean Wang,et al.  Information technology implementation for a distributed data system serving Earth scientists: seasonal to interannual ESIP , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[15]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[16]  Changzhou Wang,et al.  Remote data access via the SIESIP distributed information system , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[17]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[18]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.