Data mining and automatic OLAP schema generation

Data mining aims at extraction of previously unidentified information from large databases. It can be viewed as an automated application of algorithms to discover hidden patterns and to extract knowledge from data. Online Analytical Processing (OLAP) systems, on the other hand, allow exploring and querying huge datasets in interactive way. These OLAP systems are the predominant front-end tools used in data warehousing environments and the OLAP system's market has developed rapidly during the last few years. Several works in the past emphasized the integration of OLAP and data mining. More recently, data mining techniques along with OLAP have been applied in decision support applications to analyze large data sets in an efficient manner. However, in order to integrate data mining results with OLAP the data has to be modeled in a particular type of OLAP schema. An OLAP schema is a collection of database objects, including tables, views, indexes and synonyms. Schema generation process was considered a manual task but in the recent years research communities reported their work in automatic schema generation. In this paper, we reviewed literature on the schema generation techniques and highlighted the limitations of the existing works. The review reveals that automatic schema generation has never been integrated with data mining. Hence, we propose a model for data mining and automatic schema generation of three types namely star, snowflake, and galaxy. Hierarchical clustering technique of data mining was used and schema from the clustered data was generated. We have also developed a prototype of the proposed model and validated it via experiments of real-life data set. The proposed model is significant as it supports both integration and automation process.

[1]  Jiawei Han,et al.  Towards on-line analytical mining in large databases , 1998, SGMD.

[2]  Marc H. Scholl,et al.  Exploring OLAP aggregates with hierarchical visualization techniques , 2007, SAC '07.

[3]  Carsten Sapia,et al.  Automatically generating OLAP schemata from conceptual graphical models , 2000, DOLAP '00.

[4]  Volker Markl,et al.  Improving OLAP performance by multidimensional hierarchical clustering , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[5]  Saso Dzeroski,et al.  Using data mining and OLAP to discover patterns in a database of patients with Y-chromosome deletions , 2000, AMIA.

[6]  Volker Markl,et al.  Processing relational OLAP queries with UB-Trees and multidimensional hierarchical clustering , 2000, DMDW.

[7]  Jennifer Chiang,et al.  Issues for On-Line Analytical Mining of Data Warehouses , 1998 .

[8]  AsgharSohail,et al.  Enhancing OLAP functionality using self-organizing neural networks , 2004 .

[9]  Verónika Peralta,et al.  Towards the Automation of Data Warehouse Design , 2003 .

[10]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[11]  Nectaria Tryfona,et al.  starER: a conceptual model for data warehouse design , 1999, DOLAP '99.

[12]  Andrew Rau-Chaplin,et al.  Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining , 2001, International Conference on Computational Science.

[13]  Timos K. Sellis,et al.  CUBE File: A File Structure for Hierarchically Clustered OLAP Cubes , 2004, EDBT.

[14]  Alfredo Cuzzocrea,et al.  A Hierarchy-Driven Compression Technique for Advanced OLAP Visualization of Multidimensional Data Cubes , 2006, DaWaK.

[15]  Yuan An,et al.  SAMSTAR: An Automatic Tool for Generating Star Schemas from an Entity-Relationship Diagram , 2008, ER.

[16]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[17]  Dimitri Theodoratos,et al.  Heuristic optimization of OLAP queries in multidimensionally hierarchically clustered databases , 2001, DOLAP '01.

[18]  Ben Shneiderman,et al.  Interactive color mosaic and dendrogram displays for signal/noise optimization in microarray data analysis , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[19]  Alok N. Choudhary,et al.  High Performance OLAP and Data Mining on Parallel Computers , 1997, Data Mining and Knowledge Discovery.

[20]  Sundeep Reddy Maddi,et al.  Comparative Analysis of On-Line Analytical Processing Tools , 2007 .

[21]  Sohail Asghar,et al.  Enhancing OLAP functionality using self-organizing neural networks , 2004 .

[22]  Sabine Loudcher,et al.  A new OLAP aggregation based on the AHC technique , 2004, DOLAP '04.