Chunked Extendible Dense Arrays for Scientific Data Storage

Several meetings of the Extremely Large Databases Community for large scale scientific applications, advocate the use of multidimensional arrays as the appropriate model for representing scientific databases. Scientific databases gradually grow to massive sizes of the order of terabytes and petabytes. As such, the storage of such databases require efficient dynamic storage schemes where the array is allowed to arbitrary extend the bounds of the dimensions. Conventional multidimensional array representations cannot extend or shrink their bounds without relocating elements of the dataset. In general, extendibility of the bounds of the dimensions, is limited to only one dimension. This paper presents a technique for storing dense multidimensional arrays by chunks such that the array can be extended along any dimension without compromising the access time for an element. This is done with a computed access mapping function, that maps the kdimensional index onto a linear index of the storage locations. This concept forms the basis for the implementation of an array file of any number of dimensions, where the bounds of the array can be extended arbitrarily. Such a feature currently exists in the Hierarchical Data Format version 5 (HDF5). However, extending the bound of a dimension in the HDF5 array file can be unusually expensive in time. Such extensions in our storage scheme for dense array files can still be performed while still accessing elements of the array at orders of magnitude faster than in HDF5 or conventional arrays-files.

[1]  A. Rosenberg Managing storage for extendible arrays (Extended Abstract) , 1974, STOC '74.

[2]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[3]  Ian Foster,et al.  Disk resident arrays: an array-oriented I/O library for out-of-core computations , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[4]  T. H. Merrett,et al.  A storage scheme for extendible arrays , 2005, Computing.

[5]  Michael Stonebraker,et al.  Requirements for Science Data Bases and SciDB , 2009, CIDR.

[6]  K. M. Azharul Hasan,et al.  An Implementation Scheme for Multidimensional Extendable Array Operations and Its Evaluation , 2011 .

[7]  Arnold L. Rosenberg Allocating Storage for Extendible Arrays , 1974, JACM.

[8]  Ekow J. Otoo,et al.  Chunked extendible dense arrays for scientific data storage , 2013, Parallel Comput..

[9]  Stelios Joannou,et al.  An Empirical Evaluation of Extendible Arrays , 2011, SEA.

[10]  Hai Jin,et al.  Disk Resident Arrays: An ArrayOriented I/O Library for OutofCore Computations , 2002 .

[11]  J. Leon Zhao,et al.  Extendible arrays for statistical databases and OLAP applications , 1996, Proceedings of 8th International Conference on Scientific and Statistical Data Base Management.

[12]  Arnold L. Rosenberg,et al.  Hashing Schemes for Extendible Arrays , 1977, JACM.

[13]  Robert J. Harrison,et al.  Global arrays: A nonuniform memory access programming model for high-performance computers , 1996, The Journal of Supercomputing.

[14]  J. K. Iliffe The Use of the Genie System in Numerical Calculation , 1961 .

[15]  Doron Rotem,et al.  A Storage Scheme for Multi-dimensional Databases Using Extendible Array Files , 2006, STDBM.

[16]  Tatsuo Tsuji,et al.  History offset implementation scheme for large scale multidimensional data sets , 2008, SAC '08.

[17]  Tatsuo Tsuji,et al.  An extendible data structure for handling large multidimensional data sets , 2009, 2009 12th International Conference on Computers and Information Technology.