The Chunk-Locality Index: An Efficient Query Method for Climate Datasets

Geoscientists have a constant need to query into large-scale multidimensional array-based datasets. The most efficient way to accelerate queries is indexing. We focus on the climate datasets and propose a novel and efficient indexing method called the chunk-locality index. The main idea of this method is to take advantage of the spatial-temporal data similarity in climate datasets. We evaluate the performance of chunk-locality index in various chunk sizes with two practical climate datasets, and compare the performance results with the bitmap index. The comparison results show that the chunk-locality index presents better performance than the bitmap index not only in improving the efficiency of data queries but also in the index building time and the index size.

[1]  Tatsuo Tsuji,et al.  A storage scheme for multidimensional data alleviating dimension dependency , 2008, 2008 Third International Conference on Digital Information Management.

[2]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[3]  Kesheng Wu,et al.  FastBit: An Efficient Indexing Technology For Accelerating Data-Intensive Science , 2005 .

[4]  Arie Shoshani,et al.  Evaluation Strategies for Bitmap Indices with Binning , 2004, DEXA.

[5]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[6]  Peter Baumann,et al.  Storage of multidimensional arrays based on arbitrary tiling , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[7]  Joel H. Saltz,et al.  Titan: a high-performance remote-sensing database , 1997, Proceedings 13th International Conference on Data Engineering.

[8]  Arie Shoshani,et al.  On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[9]  R. Dickinson,et al.  The Common Land Model , 2003 .

[10]  Doron Rotem,et al.  Optimal chunking of large multidimensional arrays for data warehousing , 2007, DOLAP '07.

[11]  Michael Stonebraker,et al.  Requirements for Science Data Bases and SciDB , 2009, CIDR.

[12]  Philip A. Pinto,et al.  The Large Synoptic Survey Telescope , 2006 .

[13]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[14]  Mark de Berg,et al.  The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[15]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[16]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[17]  Martin L. Kersten,et al.  Distribution Rules for Array Database Queries , 2005, DEXA.

[18]  Peter Baumann,et al.  The multidimensional database system RasDaMan , 1998, SIGMOD '98.

[19]  Marianne Winslett,et al.  Multi-resolution bitmap indexes for scientific data , 2007, TODS.

[20]  Arie Shoshani,et al.  Compressing bitmap indexes for faster search operations , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[21]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.