Data Cube Technology

This chapter focuses on data cube technology. Data warehouse systems provide online analytical processing (OLAP) tools for interactive analysis of multidimensional data at varied granularity levels. OLAP tools typically use the data cube and a multidimensional data model to provide flexible access to summarized data. A data cube can interactively explore the data in a multidimensional way through OLAP operations like drill-down (to see more specialized data such as total sales per city) or roll-up (to see the data at a more generalized level such as total sales per country). Although the data cube concept was originally intended for OLAP, it is also useful for data mining. Multidimensional data mining is an approach to data mining that integrates OLAP-based data analysis with knowledge discovery techniques. It is also known as exploratory multidimensional data mining and online analytical mining (OLAM). It searches for interesting patterns by exploring the data in multidimensional space. Users can interactively drill down or roll up to varying abstraction levels to find classification models, clusters, predictive rules, and outliers. Methods for data cube computation and methods for multidimensional data analysis are focused on. Precomputing a data cube (or parts of a data cube) allows for fast accessing of summarized data. Given the high dimensionality of most data, multidimensional analysis can run into performance bottlenecks. Therefore, it is important to study data cube computation techniques. Data cube technology provides many effective and scalable methods for cube computation. Studying these methods also help in the understanding and further development of scalable methods for other data mining tasks such as the discovery of frequent patterns.

[1]  Peter J. Haas,et al.  Interactive data Analysis: The Control Project , 1999, Computer.

[2]  Jiawei Han,et al.  High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.

[3]  Jiawei Han,et al.  MM-Cubing: computing Iceberg cubes by factorizing the lattice space , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[4]  Jiawei Han,et al.  Promotion Analysis in Multi-Dimensional Space , 2009, Proc. VLDB Endow..

[5]  Bo Zhao,et al.  Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[6]  Laks V. S. Lakshmanan,et al.  Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[7]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[8]  Yixin Chen,et al.  Regression Cubes with Lossless Compression and Aggregation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[9]  Hongjun Lu,et al.  Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Ranga Raju Vatsavai,et al.  Map cube: A visualization tool for spatial data warehouses , 2001 .

[11]  Jian Pei,et al.  Mining Multi-Dimensional Constrained Gradients in Data Cubes , 2001, VLDB.

[12]  Mark Sullivan,et al.  Quasi-cubes: exploiting approximations in multidimensional databases , 1997, SGMD.

[13]  Hongyan Liu,et al.  C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[14]  Raghu Ramakrishnan,et al.  Bellwether analysis: predicting global aggregates from local regions , 2006, VLDB.

[15]  Xintao Wu,et al.  Using Loglinear Models to Compress Datacube , 2000, Web-Age Information Management.

[16]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[17]  Paul S. Bradley,et al.  Compressed data cubes for OLAP aggregate query approximation on continuous dimensions , 1999, KDD '99.

[18]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[19]  Jiawei Han,et al.  Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases , 2009, SDM.

[20]  Jiawei Han,et al.  ARCube: supporting ranking aggregate queries in partially materialized data cubes , 2008, SIGMOD Conference.

[21]  Jiawei Han,et al.  Object-Based Selective Materialization for Efficient Implementation of Spatial Data Cubes , 2000, IEEE Trans. Knowl. Data Eng..

[22]  Ronald Fagin,et al.  Multi-structural databases , 2005, PODS '05.

[23]  Jiawei Han,et al.  Flowcube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows , 2006, VLDB.

[24]  Jiawei Han,et al.  Answering top-k queries with multi-dimensional selections: the ranking cube approach , 2006, VLDB.

[25]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[26]  Jeffrey F. Naughton,et al.  Letter from the Special Issue Editor , 1997, IEEE Data Eng. Bull..

[27]  Raghu Ramakrishnan,et al.  Exploratory mining in cube space , 2006, Data Mining and Knowledge Discovery.

[28]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[29]  Jae-Gil Lee,et al.  Sampling cube: a framework for statistical olap over sampling data , 2008, SIGMOD Conference.

[30]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[31]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[32]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[33]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[34]  Sunita Sarawagi,et al.  Intelligent Rollups in Multidimensional OLAP Data , 2001, VLDB.

[35]  Leonid Khachiyan,et al.  Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[36]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[37]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[38]  Elena Baralis,et al.  Materialized Views Selection in a Multidimensional Database , 1997, VLDB.

[39]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[40]  Yi Lin,et al.  Prediction Cubes , 2005, VLDB.

[41]  Panos Kalnis,et al.  Efficient OLAP Operations in Spatial Data Warehouses , 2001, SSTD.

[42]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[43]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[44]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[45]  Jiawei Han,et al.  Towards on-line analytical mining in large databases , 1998, SGMD.

[46]  Laks V. S. Lakshmanan,et al.  QC-trees: an efficient summary structure for semantic OLAP , 2003, SIGMOD '03.

[47]  Kenneth A. Ross,et al.  Complex Aggregation at Multiple Granularities , 1998, EDBT.

[48]  Inderpal Singh Mumick,et al.  Selection of views to materialize in a data warehouse , 1997, IEEE Transactions on Knowledge and Data Engineering.

[49]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[50]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.

[51]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[52]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.