Navigation Rules for Exploring Large Multidimensional Data Cubes

Navigating through multidimensional data cubes is a nontrivial task. Although On-Line Analytical Processing (OLAP) provides the capability to view multidimensional data through rollup, drill-down, and slicing-dicing, it offers minimal guidance to end users in the actual knowledge discovery process. In this article, we address this knowledge discovery problem by identifying novel and useful patterns concealed in multidimensional data that are used for effective exploration of data cubes. We present an algorithm for the DIscovery of Sk-NAvigation Rules (DISNAR), which discovers the hidden interesting patterns in the form of Sk-navigation rules using a test of skewness on the pairs of the current and its candidate drill-down lattice nodes. The rules then are used to enhance navigational capabilities, as illustrated by our rule-driven system. Extensive experimental analysis shows that the DISNAR algorithm discovers the interesting patterns with a high recall and precision with small execution time and low space overhead.

[1]  Alberto O. Mendelzon,et al.  Temporal Queries in OLAP , 2000, VLDB.

[2]  Jean-François Boulicaut,et al.  Query driven knowledge discovery in multidimensional data , 1999, DOLAP '99.

[3]  Srinivasan Parthasarathy,et al.  LOADED: link-based outlier and anomaly detection in evolving data sets , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[4]  Qing Chen,et al.  Mining Exceptions And Quantitative Association Rules In Olap Data Cube , 1999 .

[5]  Patrick Marcel,et al.  Modeling and querying multidimensional databases: an overview , 1999 .

[6]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize Under a Maintenance Cost Constraint , 1999, ICDT.

[7]  Jiawei Han,et al.  Towards on-line analytical mining in large databases , 1998, SGMD.

[8]  Owen Kaser,et al.  Attribute value reordering for efficient hybrid OLAP , 2003, DOLAP '03.

[9]  Sunita Sarawagi Indexing OLAP Data , 1997, IEEE Data Eng. Bull..

[10]  Dominique Laurent,et al.  Computing appropriate representations for multidimensional data , 2001, DOLAP '01.

[11]  Jiawei Han,et al.  CCMine: Efficient Mining of Confidence-Closed Correlated Patterns , 2004, PAKDD.

[12]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[13]  Martin Atzmüller,et al.  Subgroup discovery , 2005, Künstliche Intell..

[14]  Jianping Zhang,et al.  Learning rules from highly unbalanced data sets , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[15]  Mokrane Bouzeghoub,et al.  Modeling the Data Warehouse Refreshment Process as a Workflow Application , 1999, DMDW.

[16]  David Taniar,et al.  Mining Association Rules in Data Warehouses , 2005, Int. J. Data Warehous. Min..

[17]  Myoung-Ho Kim,et al.  Rewriting OLAP queries using materialized views and dimension hierarchies in data warehouses , 2001, Proceedings 17th International Conference on Data Engineering.

[18]  Wolfgang Lehner,et al.  An Alternative Relational OLAP Modeling Approach , 2000, DaWaK.

[19]  Lotfi Lakhal,et al.  Cube Lattices: A Framework for Multidimensional Data Mining , 2003, SDM.

[20]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[21]  Jennifer Chiang,et al.  Issues for On-Line Analytical Mining of Data Warehouses , 1998 .

[22]  Daniel Lemire Wavelet-based relative prefix sum methods for range sum queries in data cubes , 2002, CASCON.

[23]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[24]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.

[25]  Alex Alves Freitas,et al.  Discovering Surprising Instances of Simpson's Paradox in Hierarchical Multidimensional Data , 2006, Int. J. Data Warehous. Min..

[26]  Hui Xiong,et al.  Mining strong affinity association patterns in data sets with skewed support distribution , 2003, Third IEEE International Conference on Data Mining.

[27]  Panos Vassiliadis,et al.  A Presentation Model & Non-Traditional Visualization for OLAP , 2005, Int. J. Data Warehous. Min..

[28]  Kenneth A. Ross,et al.  Adapting materialized views after redefinitions: techniques and a performance study , 2001, Inf. Syst..

[29]  Chung Keung Poon Dynamic orthogonal range queries in OLAP , 2003, Theor. Comput. Sci..

[30]  Ying Chen,et al.  Improved Data Partitioning for Building Large ROLAP Data Cubes in Parallel , 2006, Int. J. Data Warehous. Min..

[31]  Sunita Sarawagi,et al.  Explaining Differences in Multidimensional Aggregates , 1999, VLDB.

[32]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[33]  Sanjay Bapna,et al.  A Web-Based GIS for Analyzing Commercial Motor Vehicle Crashes , 2005, Inf. Resour. Manag. J..

[34]  Timos K. Sellis,et al.  Dynamic Data Warehouse Design , 1999, DaWaK.

[35]  Navin Kumar,et al.  Supporting mobile decision making with association rules and multi-layered caching , 2007, Decis. Support Syst..

[36]  Alberto O. Mendelzon,et al.  A Temporal Query Language for OLAP: Implementation and a Case Study , 2001, DBPL.

[37]  Sunita Sarawagi,et al.  User-Adaptive Exploration of Multidimensional Data , 2000, VLDB.

[38]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[39]  Lixin Fu Novel Efficient Classifiers Based on Data Cube , 2005, Int. J. Data Warehous. Min..

[40]  Ralph B. D'Agostino,et al.  Goodness-of-Fit-Techniques , 2020 .

[41]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.

[42]  Alberto O. Mendelzon,et al.  Reasoning about Summarizability in Heterogeneous Multidimensional Schemas , 2001, ICDT.

[43]  Timos K. Sellis,et al.  Designing Data Warehouses , 1999, Data Knowl. Eng..

[44]  Christos Faloutsos,et al.  The "DGX" distribution for mining massive, skewed data , 2001, KDD '01.

[45]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[46]  Matteo Golfarelli,et al.  Applying Vertical Fragmentation Techniques in Logical Design of Multidimensional Databases , 2000, DaWaK.