History-Pattern Implementation for Large-Scale Dynamic Multidimensional Datasets and Its Evaluations

In this paper, we present a novel encoding/decoding method for dynamic multidimensional datasets and its implementation scheme. Our method encodes an n-dimensional tuple into a pair of scalar values even if n is sufficiently large. The method also encodes and decodes tuples using only shift and and/or register instructions. One of the most serious problems in multidimensional array based tuple encoding is that the size of an encoded result may often exceed the machine word size for large-scale tuple sets. This problem is efficiently resolved in our scheme. We confirmed the advantages of our scheme by analytical and experimental evaluations. The experimental evaluations were conducted to compare our constructed prototype system with other systems; (1) a system based on a similar encoding scheme called history-offset encoding, and (2) PostgreSQL RDBMS. In most cases, both the storage and retrieval costs of our system significantly outperformed those of the other systems.

[1]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[2]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[3]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[4]  Tatsuo Tsuji,et al.  Implementing Vertical Splitting for Large Scale Multidimensional Datasets and Its Evaluations , 2011, DaWaK.

[5]  Tatsuo Tsuji,et al.  An Efficient Implementation for MOLAP Basic Data Structure and Its Evaluation , 2007, DASFAA.

[6]  Volker Markl,et al.  Interval processing with the UB-Tree , 2002, Proceedings International Database Engineering and Applications Symposium.

[7]  Volker Markl,et al.  Integrating the UB-Tree into a Database System Kernel , 2000, VLDB.

[8]  Rudolf Bayer,et al.  The Universal B-Tree for Multidimensional Indexing: general Concepts , 1997, WWCA.

[9]  Marianne Winslett,et al.  Physical schemas for large multidimensional arrays in scientific computing applications , 1994, Seventh International Working Conference on Scientific and Statistical Database Management.

[10]  Beng Chin Ooi,et al.  Generalized multidimensional data mapping and query processing , 2005, TODS.

[11]  Doron Rotem,et al.  Optimal chunking of large multidimensional arrays for data warehousing , 2007, DOLAP '07.

[12]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[13]  Tatsuo Tsuji,et al.  History-Offset Implementation Scheme of XML Documents and Its Evaluations , 2013, DASFAA.

[14]  Doron Rotem,et al.  A Storage Scheme for Multi-dimensional Databases Using Extendible Array Files , 2006, STDBM.

[15]  Doron Rotem,et al.  Efficient Storage Allocation of Large-Scale Extendible Multi-dimensional Scientific Datasets , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).