论文信息 - The Chunk-Locality Index: An Efficient Query Method for Climate Datasets

The Chunk-Locality Index: An Efficient Query Method for Climate Datasets

Geoscientists have a constant need to query into large-scale multidimensional array-based datasets. The most efficient way to accelerate queries is indexing. We focus on the climate datasets and propose a novel and efficient indexing method called the chunk-locality index. The main idea of this method is to take advantage of the spatial-temporal data similarity in climate datasets. We evaluate the performance of chunk-locality index in various chunk sizes with two practical climate datasets, and compare the performance results with the bitmap index. The comparison results show that the chunk-locality index presents better performance than the bitmap index not only in improving the efficiency of data queries but also in the index building time and the index size.

Guangwen Yang | Cheng Chen | Haohuan Fu | Xiaomeng Huang

[1] Tatsuo Tsuji,et al. A storage scheme for multidimensional data alleviating dimension dependency , 2008, 2008 Third International Conference on Digital Information Management.

[2] Michael Stonebraker,et al. Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[3] Kesheng Wu,et al. FastBit: An Efficient Indexing Technology For Accelerating Data-Intensive Science , 2005 .

[4] Arie Shoshani,et al. Evaluation Strategies for Bitmap Indices with Binning , 2004, DEXA.

[5] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[6] Peter Baumann,et al. Storage of multidimensional arrays based on arbitrary tiling , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[7] Joel H. Saltz,et al. Titan: a high-performance remote-sensing database , 1997, Proceedings 13th International Conference on Data Engineering.

[8] Arie Shoshani,et al. On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[9] R. Dickinson,et al. The Common Land Model , 2003 .

[10] Doron Rotem,et al. Optimal chunking of large multidimensional arrays for data warehousing , 2007, DOLAP '07.

[11] Michael Stonebraker,et al. Requirements for Science Data Bases and SciDB , 2009, CIDR.

[12] Philip A. Pinto,et al. The Large Synoptic Survey Telescope , 2006 .

[13] J. T. Robinson,et al. The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[14] Mark de Berg,et al. The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[15] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.

[16] Hans-Peter Kriegel,et al. The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[17] Martin L. Kersten,et al. Distribution Rules for Array Database Queries , 2005, DEXA.

[18] Peter Baumann,et al. The multidimensional database system RasDaMan , 1998, SIGMOD '98.

[19] Marianne Winslett,et al. Multi-resolution bitmap indexes for scientific data , 2007, TODS.

[20] Arie Shoshani,et al. Compressing bitmap indexes for faster search operations , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[21] Paul G. Brown,et al. Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.