Empowering the OLAP Technology to Support Complex Dimension Hierarchies

Comprehensive data analysis has become indispensable in a variety of domains. OLAP (On-Line Analytical Processing) systems tend to perform poorly or even fail when applied to complex data scenarios. The restriction of the underlying multidimensional data model to admit only homogeneous and balanced dimension hierarchies is too rigid for many real-world applications and, therefore, has to be overcome in order to provide adequate OLAP support. We present a framework for classifying and modeling complex multidimensional data, with the major effort at the conceptual level as to transform irregular hierarchies to make them navigable in a uniform manner. The properties of various hierarchy types are formalized and a two-phase normalization approach is proposed: heterogeneous dimensions are reshaped into a set of wellbehaved homogeneous subdimensions, followed by the enforcement of summarizability in each dimension’s data hierarchy. Mapping the data to a visual data browser relies solely on metadata, which captures the properties of facts, dimensions, and relationships within the dimensions. The navigation is schema-based, that is, users interact with dimensional levels with on-demand data display. The power of our approach is exemplified using a real-world study from the domain of academic administration.

[1]  Gottfried Vossen,et al.  Multidimensional normal forms for data warehouse design , 2003, Inf. Syst..

[2]  Esteban Zimányi,et al.  OLAP Hierarchies: A Conceptual Perspective , 2004, CAiSE.

[3]  Dennis McLeod,et al.  An Adaptive Probe-Based Technique to Optimize Join Queries in Distributed Internet Databases , 2001, J. Database Manag..

[4]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[5]  Hidehiko Tanaka,et al.  Application of hash to data base machine and its architecture , 1983, New Generation Computing.

[6]  Domenico Saccà,et al.  Database partitioning in a cluster of processors , 1983, TODS.

[7]  Alberto O. Mendelzon,et al.  Capturing summarizability with integrity constraints in OLAP , 2005, TODS.

[8]  Miron Livny,et al.  Multi-disk management algorithms , 1987, SIGMETRICS '87.

[9]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[10]  Wolfgang Lehner,et al.  A Decathlon in Multidimensional Modeling: Open Issues and Some Solutions , 2002, DaWaK.

[11]  Christian S. Jensen,et al.  A foundation for capturing and querying complex multidimensional data , 2001, Inf. Syst..

[12]  Clement T. Yu,et al.  Distributed query processing , 1984, CSUR.

[13]  Barbara Dinter,et al.  Extending the E/R Model for the Multidimensional Paradigm , 1998, ER Workshops.

[14]  Peter M G Apers,et al.  Data allocation in distributed database systems , 1988, TODS.

[15]  Clement T. Yu,et al.  Partition Strategy for Distributed Query Processing in Fast Local Networks , 1989, IEEE Trans. Software Eng..

[16]  Alberto O. Mendelzon,et al.  Reasoning about Summarizability in Heterogeneous Multidimensional Schemas , 2001, ICDT.

[17]  Wolfgang Lehner,et al.  Normal forms for multidimensional databases , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[18]  Arie Shoshani,et al.  STORM: A Statistical Object Representation Model , 1990, IEEE Data Eng. Bull..

[19]  Clement T. Yu,et al.  Algorithms to Process Distributed Queries in Fast Local Networks , 1987, IEEE Transactions on Computers.

[20]  Esteban Zimányi,et al.  Hierarchies in a multidimensional model: From conceptual modeling to logical representation , 2006, Data Knowl. Eng..

[21]  Dennis Shasha,et al.  Optimizing equijoin queries in distributed databases where relations are hash partitioned , 1991, TODS.

[22]  Marc H. Scholl,et al.  Extending Visual OLAP for Handling Irregular Dimensional Hierarchies , 2006, DaWaK.

[23]  Wookey Lee,et al.  An Asynchronous Differential Join in Distributed Data Replications , 1999, J. Database Manag..

[24]  Marc H. Scholl,et al.  Exploring OLAP aggregates with hierarchical visualization techniques , 2007, SAC '07.

[25]  Peter Thanisch,et al.  Logical Multidimensional Database Design for Ragged and Unbalanced Aggregation , 2001, DMDW.

[26]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[27]  Alberto O. Mendelzon,et al.  OLAP dimension constraints , 2002, PODS '02.