Efficient organization of large multidimensional arrays

Large multidimensional arrays are widely used in scientific and engineering database applications. The authors present methods of organizing arrays to make their access on secondary and tertiary memory devices fast and efficient. They have developed four techniques for doing this: (1) storing the array in multidimensional "chunks" to minimize the number of blocks fetched, (2) reordering the chunked array to minimize seek distance between accessed blocks, (3) maintaining redundant copies of the array, each organized for a different chunk size and ordering and (4) partitioning the array onto platters of a tertiary memory device so as to minimize the number of platter switches. The measurements on real data obtained from global change scientists show that accesses on arrays organized using these techniques are often an order of magnitude faster than on the unoptimized data.<<ETX>>

[1]  Edward G. Coffman,et al.  Organizing matrices and matrix operations for paged memory systems , 1969, Commun. ACM.

[2]  Arnold L. Rosenberg,et al.  Preserving Proximity in Arrays , 1975, SIAM J. Comput..

[3]  Patrick C. Fischer,et al.  Storage reorganization techniques for matrix computation in a paging environment , 1979, CACM.

[4]  Bruce H. McCormick,et al.  Picture Paging for Efficient Image Processing , 1980, Pictorial Information Systems.

[5]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[6]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[7]  Ben Tsutom Wada,et al.  A virtual memory system for picture processing , 1984, CACM.

[8]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[9]  William Equitz,et al.  A new vector quantization clustering algorithm , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  H. V. Jagadish,et al.  Linear clustering of objects with multiple attributes , 1990, SIGMOD '90.

[11]  Michael Stonebraker,et al.  The POSTGRES next generation database management system , 1991, CACM.

[12]  H. K. Ramapriyan,et al.  Planning For The Eos Data and Information System (EOSDIS) , 1991 .

[13]  Michael Stonebraker,et al.  An overview of the Sequoia 2000 project , 1992, Digest of Papers COMPCON Spring 1992.

[14]  James Franklin Tiled Virtual Memory for UNIX , 1992, USENIX Summer.

[15]  Michael Stonebraker,et al.  Large object support in POSTGRES , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[16]  J. A. Spahr,et al.  Parallelization and Distribution of a Coupled Atmosphere–Ocean General Circulation Model , 1993 .

[17]  Arie Shoshani,et al.  Optimizing tertiary storage organization and access for spatio-temporal datasets , 1994 .

[18]  Arie Shoshani,et al.  Efficient organization and access of multi-dimensional datasets on tertiary storage systems , 1995, Inf. Syst..