Data Warehousing and Online Analytical Processing

This chapter presents an overview of data warehouse and online analytical processing (OLAP) technology. This overview is essential for understanding the overall data mining and knowledge discovery process. Data warehouses generalize and consolidate data in multidimensional space. The construction of data warehouses involves data cleaning, data integration, and data transformation, and can be viewed as an important preprocessing step for data mining. Moreover, data warehouses provide OLAP tools for the interactive analysis of multidimensional data of varied granularities, which facilitates effective data generalization and data mining. Many other data mining functions, such as association, classification, prediction, and clustering, can be integrated with OLAP operations to enhance interactive mining of knowledge at multiple levels of abstraction. Hence, the data warehouse has become an increasingly important platform for data analysis and OLAP and will provide an effective platform for data mining. Therefore, data warehousing and OLAP form an essential step in the knowledge discovery process. The data cube, a multidimensional data model for data warehouses and OLAP, as well as OLAP operations such as roll-up, drilldown, slicing, and dicing are focused on. Data warehouse design and usage are also discussed followed by a discussion of Multidimensional data mining, a powerful paradigm that integrates data warehouse and OLAP technology with that of data mining. An overview of data warehouse implementation examines general strategies for efficient data cube computation, OLAP data indexing, and OLAP query processing. Finally, data generalization by attribute-oriented induction is studied. This method uses concept hierarchies to generalize data to multiple levels of abstraction.

[1]  Michael J. Carey,et al.  Reducing the Braking Distance of an SQL Query Engine , 1998, VLDB.

[2]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[3]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[4]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[5]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[6]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[7]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[8]  Peter J. Haas,et al.  Interactive data Analysis: The Control Project , 1999, Computer.

[9]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[10]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[11]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[12]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[13]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[14]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit , 2009 .

[15]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[16]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[17]  Jeffrey F. Naughton,et al.  Letter from the Special Issue Editor , 1997, IEEE Data Eng. Bull..

[18]  Arie Shoshani,et al.  OLAP and statistical databases: similarities and differences , 1997, PODS '97.

[19]  Raghu Ramakrishnan,et al.  Probabilistic Optimization of Top N Queries , 1999, VLDB.

[20]  Patrick Valduriez,et al.  Join indices , 1987, TODS.

[21]  Ashish Gupta,et al.  Materialized views: techniques, implementations, and applications , 1999 .

[22]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.

[23]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[24]  Goetz Graefe,et al.  Multi-table joins through bitmapped join indices , 1995, SGMD.

[25]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[26]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[27]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.