Exploiting the Multi-Append-Only-Trend Property of Historical Data in Data Warehouses

Data warehouses maintain historical information to enable the discovery of trends and developments over time. Hence data items usually contain time-related attributes like the time of a sales transaction or the order and shipping date of a product. Furthermore the values of these time-related attributes have a tendency to increase over time. We refer to this as the Multi-Append-Only-Trend (MAOT) property. In this paper we formalize the notion of MAOT and show how taking advantage of this property can improve query performance considerably. We focus on range aggregate queries which are essential for summarizing large data sets. Compared to MOLAP data cubes the amount of pre-computation and hence additional storage in the proposed technique is dramatically reduced.

[1]  Divyakant Agrawal,et al.  Efficient integration and aggregation of historical information , 2002, SIGMOD '02.

[2]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[3]  Christian Böhm,et al.  Dynamically Optimizing High-Dimensional Index Structures , 2000, EDBT.

[4]  Divyakant Agrawal,et al.  pCube: Update-efficient online aggregation with progressive feedback and error bounds , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[5]  Volker Markl,et al.  Improving OLAP performance by multidimensional hierarchical clustering , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[6]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[7]  Divyakant Agrawal,et al.  Flexible Data Cubes for Online Aggregation , 2001, ICDT.

[8]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[9]  Arie Segev,et al.  A consensus glossary of temporal database concepts , 1994, SIGMOD 1994.

[10]  Sharad Mehrotra,et al.  Progressive approximate aggregate queries with a multi-resolution tree structure , 2001, SIGMOD '01.

[11]  Gerth Stølting Brodal,et al.  Partially Persistent Data Structures of Bounded Degree with Constant Update Time , 1994, Nord. J. Comput..

[12]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[13]  Sushil Jajodia,et al.  Temporal Databases: Research and Practice , 1998 .

[14]  Divyakant Agrawal,et al.  The Dynamic Data Cube , 2000, EDBT.

[15]  Bernard Chazelle,et al.  A Functional Approach to Data Structures and Its Use in Multidimensional Searching , 1988, SIAM J. Comput..

[16]  Peter C. Lockemann,et al.  Advances in Database Technology — EDBT 2000 , 2000, Lecture Notes in Computer Science.

[17]  Yannis E. Ioannidis,et al.  An efficient bitmap encoding scheme for selection queries , 1999, SIGMOD '99.

[18]  Hans-Peter Kriegel,et al.  The DC-tree: a fully dynamic index structure for data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Yannis E. Ioannidis,et al.  Hierarchical Prefix Cubes for Range-Sum Queries , 1999, VLDB.

[20]  Heinrich Müller,et al.  Effiziente Methoden der geometrischen Modellierung und der wissenschaftlichen Visualisierung, Dagstuhl Seminar 1997 , 1999, Effiziente Methoden der geometrischen Modellierung und der wissenschaftlichen Visualisierung.

[21]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[22]  George S. Lueker,et al.  Adding range restriction capability to dynamic data structures , 1985, JACM.

[23]  AgrawalRakesh,et al.  Range queries in OLAP data cubes , 1997 .

[24]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[25]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[26]  Robert E. Tarjan,et al.  Making data structures persistent , 1986, STOC '86.

[27]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[28]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.