A survey on summarizability issues in multidimensional modeling

The development of a data warehouse (DW) system is based on a conceptual multidimensional model, which provides a high level of abstraction in accurately and expressively describing real-world situations. Once this model is designed, the corresponding logical representation must be obtained as the basis of the implementation of the DW according to one specific technology. However, even though a good conceptual multidimensional model is designed underneath a DW, there is a semantic gap between this model and its logical representation. In particular, this gap complicates an adequate treatment of summarizability issues, which in turn may lead to erroneous results of data analysis tools. Research addressing this topic has produced only partial solutions, and individual terminology used by different parties hinders further progress. Consequently, based on a unifying vocabulary, this survey sheds light on (i) the weak and strong points of current approaches for modeling complex multidimensional structures that reflect real-world situations in a conceptual multidimensional model and (ii) existing mechanisms to avoid summarizability problems when conceptual multidimensional models are being implemented.

[1]  Jose-Norberto Mazón,et al.  An MDA approach for the development of data warehouses , 2008, Decis. Support Syst..

[2]  Peter Thanisch,et al.  Normalising OLAP cubes for controlling sparsity , 2003, Data Knowl. Eng..

[3]  Laks V. S. Lakshmanan,et al.  What can Hierarchies do for Data Warehouses? , 1999, VLDB.

[4]  Matthias Jarke,et al.  Fundamentals of Data Warehouses , 2000, Springer Berlin Heidelberg.

[5]  Alberto O. Mendelzon,et al.  Capturing summarizability with integrity constraints in OLAP , 2005, TODS.

[6]  Peter Thanisch,et al.  Logical Multidimensional Database Design for Ragged and Unbalanced Aggregation , 2001, DMDW.

[7]  Gottfried Vossen,et al.  Conceptual data warehouse modeling , 2000, DMDW.

[8]  Christian S. Jensen,et al.  A foundation for capturing and querying complex multidimensional data , 2001, Inf. Syst..

[9]  Gottfried Vossen,et al.  Multidimensional normal forms for data warehouse design , 2003, Inf. Syst..

[10]  Esteban Zimányi,et al.  OLAP Hierarchies: A Conceptual Perspective , 2004, CAiSE.

[11]  José Samos,et al.  YAM2: a multidimensional conceptual model extending UML , 2006, Inf. Syst..

[12]  Isabelle Comyn-Wattiau,et al.  Extracting generalization hierarchies from relational databases: A reverse engineering approach , 2007, Data Knowl. Eng..

[13]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[14]  Svetlana Mansmann Empowering the OLAP Technology to Support Complex Dimension Hierarchies , 2009, Selected Readings on Database Technologies and Applications.

[15]  Daniel L. Moody,et al.  From enterprise models to dimensional models: a methodology for data warehouse and data mart design , 2000, DMDW.

[16]  Torben Bach Pedersen,et al.  Multidimensional data modeling for location-based services , 2002, GIS '02.

[17]  Alberto Abelló,et al.  A Survey of Multidimensional Modeling Methodologies , 2009, Int. J. Data Warehous. Min..

[18]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[19]  Isabelle Comyn-Wattiau,et al.  Dimension Hierarchies Design from UML Generalizations and Aggregations , 2001, ER.

[20]  Torben Bach Pedersen,et al.  Integrating Data Warehouses with Web Data: A Survey , 2008, IEEE Transactions on Knowledge and Data Engineering.

[21]  Il-Yeol Song,et al.  A Taxonomy of Inaccurate Summaries and Their Management in OLAP Systems , 2005, ER.

[22]  Il-Yeol Song,et al.  A UML profile for multidimensional modeling in data warehouses , 2006, Data Knowl. Eng..

[23]  Torben Bach Pedersen,et al.  Extending Practical Pre-Aggregation in On-Line Analytical Processing , 1999, VLDB.

[24]  Marc H. Scholl,et al.  Extending Visual OLAP for Handling Irregular Dimensional Hierarchies , 2006, DaWaK.

[25]  Jose-Norberto Mazón,et al.  Solving summarizability problems in fact-dimension relationships for multidimensional models , 2008, DOLAP '08.

[26]  Jens Lechtenbörger,et al.  Data warehouse schema design , 2001, DISDBIS.

[27]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[28]  Arie Shoshani,et al.  STORM: A Statistical Object Representation Model , 1990, IEEE Data Eng. Bull..

[29]  Jose-Norberto Mazón,et al.  Reconciling requirement-driven data , 2007 .

[30]  Il-Yeol Song,et al.  An Analysis of Many-to-Many Relationships Between Fact and Dimension Tables in Dimensional Modeling , 2001, DMDW.

[31]  Esteban Zimányi,et al.  Hierarchies in a multidimensional model: From conceptual modeling to logical representation , 2006, Data Knowl. Eng..

[32]  Ron Weber,et al.  Should Optional Properties Be Used in Conceptual Modelling? A Theory and Three Empirical Tests , 2001, Inf. Syst. Res..

[33]  Alberto Abelló,et al.  Research in data warehouse modeling and design: dead or alive? , 2006, DOLAP '06.

[34]  Gottfried Vossen,et al.  DataWarehouse Detective: Schema Design Made Easy , 2007, BTW.

[35]  Isabelle Comyn-Wattiau,et al.  A UML-based data warehouse design method , 2006, Decis. Support Syst..

[36]  Wolfgang Lehner,et al.  Normal forms for multidimensional databases , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).